csle_agents.agents.dfsp_local package
Submodules
csle_agents.agents.dfsp_local.dfsp_local_agent module
- class csle_agents.agents.dfsp_local.dfsp_local_agent.DFSPLocalAgent(defender_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, attacker_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], ppo_experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, de_experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, vi_experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None)[source]
Bases:
csle_agents.agents.base.base_agent.BaseAgent
RL Agent implementing the local DFSP algorithm
- attacker_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float] [source]
Learns a best response strategy for the attacker against a given defender strategy
- Parameters
seed – the random seed
defender_strategy – the defender strategy
attacker_strategy – the attacker strategy
- Returns
the learned best response strategy and the average return
- static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]] [source]
Computes the average metrics of a dict with aggregated metrics
- Parameters
metrics – the dict with the aggregated metrics
- Returns
the average metrics
- defender_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float] [source]
Learns a best response for the defender against a given attacker strategy
- Parameters
seed – the random seed
defender_strategy – the defender strategy
attacker_strategy – the attacker strategy
- Returns
the learned best response strategy and the average return
- evaluate_attacker_policy(defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.ppo_policy.PPOPolicy) Dict[str, Union[float, int]] [source]
Monte-Carlo evaluation of the game value of a given attacker strategy against the average defender strategy
- Parameters
defender_strategy – the average defender strategy
attacker_strategy – the attacker strategy to evaluate
- Returns
the average reward
- evaluate_defender_policy(defender_strategy: csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Dict[str, Union[float, int]] [source]
Monte-Carlo evaluation of the game value of a given defender policy against the average attacker strategy
- Parameters
defender_thresholds – the defender strategy to evaluate
attacker_strategy – the average attacker strategy
- Returns
the average reward
- evaluate_strategy_profile(defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Dict[str, Union[float, int]] [source]
Monte-Carlo evaluation of the game value following a given strategy profile
- Parameters
defender_strategy – the average defender strategy
attacker_strategy – the average attacker strategy
- Returns
the average reward
- static exploitability(attacker_val: float, defender_val: float) float [source]
Computes the exploitability metric given the value of the attacker when following a best response against the current defender strategy and the value of the defender when following a best response against the current attacker strategy.
- Parameters
attacker_val – the value of the attacker when following a best response against the current defender strategy
defender_val – the value of the defender when following a best response against the current attacker strategy
- Returns
the exploitability
- local_dfsp(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, env: csle_common.dao.simulation_config.base_env.BaseEnv, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult [source]
Implements the logic of the local DFSP algorithm
- Parameters
exp_result – the experiment result
seed – the random seed of the experiment
env – the environment for the experiment
training_job – the training job for the experiment
random_seeds – the random seeds for the experiment
- Returns
None
- static round_vec(vec) List[float] [source]
Rounds a vector to 3 decimals
- Parameters
vec – the vector to round
- Returns
the rounded vector
- static running_average(x: List[float], N: int) List[float] [source]
Calculates the running average of the last N elements of vector x
- Parameters
x – the vector
N – the number of elements to use for average calculation
- Returns
the running average vector
- train() csle_common.dao.training.experiment_execution.ExperimentExecution [source]
Performs the policy training for the given random seeds using the local DFSP algorithm
- Returns
the training metrics and the trained policies
- static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]] [source]
Update a dict with aggregated metrics using new information from the environment
- Parameters
metrics – the dict with the aggregated metrics
info – the new information
- Returns
the updated dict
csle_agents.agents.dfsp_local.dfsp_local_ppo_agent module
- class csle_agents.agents.dfsp_local.dfsp_local_ppo_agent.DFSPLocalPPOAgent(defender_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, attacker_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None)[source]
Bases:
csle_agents.agents.base.base_agent.BaseAgent
RL Agent implementing the local DFSP algorithm
- attacker_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float] [source]
Learns a best response strategy for the attacker against a given defender strategy
- Parameters
seed – the random seed
defender_strategy – the defender strategy
attacker_strategy – the attacker strategy
- Returns
the learned best response strategy and the average return
- static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]] [source]
Computes the average metrics of a dict with aggregated metrics
- Parameters
metrics – the dict with the aggregated metrics
- Returns
the average metrics
- defender_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float] [source]
Learns a best response for the defender against a given attacker strategy
- Parameters
seed – the random seed
defender_strategy – the defender strategy
attacker_strategy – the attacker strategy
- Returns
the learned best response strategy and the average return
- evaluate_attacker_policy(defender_strategy: csle_common.dao.training.policy.Policy, attacker_strategy: csle_common.dao.training.policy.Policy) Dict[str, Union[float, int]] [source]
Monte-Carlo evaluation of the game value of a given attacker strategy against the average defender strategy
- Parameters
defender_strategy – the average defender strategy
attacker_strategy – the attacker strategy to evaluate
- Returns
the average reward
- evaluate_defender_policy(defender_strategy: csle_common.dao.training.policy.Policy, attacker_strategy: csle_common.dao.training.policy.Policy) Dict[str, Union[float, int]] [source]
Monte-Carlo evaluation of the game value of a given defender policy against the average attacker strategy
- Parameters
defender_thresholds – the defender strategy to evaluate
attacker_strategy – the average attacker strategy
- Returns
the average reward
- evaluate_strategy_profile(defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Dict[str, Union[float, int]] [source]
Monte-Carlo evaluation of the game value following a given strategy profile
- Parameters
defender_strategy – the average defender strategy
attacker_strategy – the average attacker strategy
- Returns
the average reward
- static exploitability(attacker_val: float, defender_val: float) float [source]
Computes the exploitability metric given the value of the attacker when following a best response against the current defender strategy and the value of the defender when following a best response against the current attacker strategy.
- Parameters
attacker_val – the value of the attacker when following a best response against the current defender strategy
defender_val – the value of the defender when following a best response against the current attacker strategy
- Returns
the exploitability
- get_attacker_experiment_config() csle_common.dao.training.experiment_config.ExperimentConfig [source]
- Returns
the experiment configuration for learning a best response of the attacker
- get_defender_experiment_config() csle_common.dao.training.experiment_config.ExperimentConfig [source]
- Returns
the experiment configuration for learning a best response of the defender
- local_dfsp(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, env: csle_common.dao.simulation_config.base_env.BaseEnv, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult [source]
Implements the local DFSP training logic
- Parameters
exp_result – the experiment result
seed – the seed for the experiments
env – the environment for the experiment
training_job – the training job
random_seeds – the random seeds for the experiment
- Returns
None
- static round_vec(vec) List[float] [source]
Rounds a vector to 3 decimals
- Parameters
vec – the vector to round
- Returns
the rounded vector
- static running_average(x: List[float], N: int) List[float] [source]
Calculates the running average of the last N elements of vector x
- Parameters
x – the vector
N – the number of elements to use for average calculation
- Returns
the running average vector
- train() csle_common.dao.training.experiment_execution.ExperimentExecution [source]
Performs the policy training for the given random seeds using the local DFSP algorithm
- Returns
the training metrics and the trained policies
- static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]] [source]
Update a dict with aggregated metrics using new information from the environment
- Parameters
metrics – the dict with the aggregated metrics
info – the new information
- Returns
the updated dict