csle_agents.agents.t_fp package

Submodules

csle_agents.agents.t_fp.t_fp_agent module

class csle_agents.agents.t_fp.t_fp_agent.TFPAgent(defender_simulation_env_config: SimulationEnvConfig, attacker_simulation_env_config: SimulationEnvConfig, emulation_env_config: Union[None, EmulationEnvConfig], experiment_config: ExperimentConfig, training_job: Optional[TrainingJobConfig] = None)[source]

Bases: BaseAgent

RL Agent implementing the T-FP algorithm from (Hammar & Stadler ‘23 - Learning Near-Optimal Intrusion Responses Against Dynamic Attackers)

attacker_best_response(seed: int, defender_strategy: MixedMultiThresholdStoppingPolicy, attacker_strategy: MixedMultiThresholdStoppingPolicy) → Tuple[List[List[float]], float][source]

Learns a threshold best response strategy for the attacker against a given defender strategy

Parameters

seed – the random seed
defender_strategy – the defender strategy
attacker_strategy – the attacker strategy

Returns

the learned threshold strategy and its estimated value

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) → Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters: metrics – the dict with the aggregated metrics
Returns: the average metrics

defender_best_response(seed: int, attacker_strategy: MixedMultiThresholdStoppingPolicy) → Tuple[List[float], float][source]

Learns a best response for the defender against a given attacker strategy

Parameters

seed – the random seed
attacker_strategy – the attacker strategy

Returns

the learned thresholds and the value

evaluate_attacker_policy(attacker_thresholds: List[List[float]], defender_strategy: MixedMultiThresholdStoppingPolicy, attacker_strategy: MixedMultiThresholdStoppingPolicy) → Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value of a given attacker policy against the average defender strategy

Parameters

defender_thresholds – the defender strategy to evaluate
defender_strategy – the average defender strategy
attacker_strategy – the average attacker strategy

Returns

the average reward

evaluate_defender_policy(defender_thresholds: List[float], attacker_strategy: MixedMultiThresholdStoppingPolicy) → Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value of a given defender policy against the average attacker strategy

Parameters

defender_thresholds – the defender strategy to evaluate
attacker_strategy – the average attacker strategy

Returns

the average reward

evaluate_strategy_profile(defender_strategy: MixedMultiThresholdStoppingPolicy, attacker_strategy: MixedMultiThresholdStoppingPolicy) → Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value following a given strategy profile

Parameters

defender_strategy – the average defender strategy
attacker_strategy – the average attacker strategy

Returns

the average reward

static exploitability(attacker_val: float, defender_val: float) → float[source]

Computes the exploitability metric given the value of the attacker when following a best response against the current defender strategy and the value of the defender when following a best response against the current attacker strategy.

Parameters

attacker_val – the value of the attacker when following a best response against the current defender strategy
defender_val – the value of the defender when following a best response against the current attacker strategy

Returns

the exploitability

get_attacker_experiment_config() → ExperimentConfig[source]

Returns: the experiment configuration for learning a best response of the attacker

get_defender_experiment_config() → ExperimentConfig[source]

Returns: the experiment configuration for learning a best response of the defender

hparam_names() → List[str][source]

Returns: a list with the hyperparameter names

static round_vec(vec) → List[float][source]

Rounds a vector to 3 decimals

Parameters: vec – the vector to round
Returns: the rounded vector

static running_average(x: List[float], N: int) → List[float][source]

Calculates the running average of the last N elements of vector x

Parameters

x – the vector
N – the number of elements to use for average calculation

Returns

the running average vector

t_fp(exp_result: ExperimentResult, seed: int, env: BaseEnv, training_job: TrainingJobConfig, random_seeds: List[int])[source]

Runs the T-FP algorithm (Hammar, Stadler 2023)

Parameters

exp_result – the experiment result
seed – the seed for the experiment
env – environment for evaluation
training_job – the training job for the evaluation
random_seeds – the random seeds for the evaluation

Returns

the experiment result

train() → ExperimentExecution[source]

Performs the policy training for the given random seeds using T-FP

Returns: the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) → Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters

metrics – the dict with the aggregated metrics
info – the new information

Returns

the updated dict

csle_agents.agents.t_fp package

Submodules

csle_agents.agents.t_fp.t_fp_agent module

Module contents