csle_agents.agents.dfsp_local package

Submodules

csle_agents.agents.dfsp_local.dfsp_local_agent module

class csle_agents.agents.dfsp_local.dfsp_local_agent.DFSPLocalAgent(defender_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, attacker_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], ppo_experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, de_experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, vi_experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

RL Agent implementing the local DFSP algorithm

attacker_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float][source]

Learns a best response strategy for the attacker against a given defender strategy

Parameters
  • seed – the random seed

  • defender_strategy – the defender strategy

  • attacker_strategy – the attacker strategy

Returns

the learned best response strategy and the average return

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters

metrics – the dict with the aggregated metrics

Returns

the average metrics

defender_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float][source]

Learns a best response for the defender against a given attacker strategy

Parameters
  • seed – the random seed

  • defender_strategy – the defender strategy

  • attacker_strategy – the attacker strategy

Returns

the learned best response strategy and the average return

evaluate_attacker_policy(defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.ppo_policy.PPOPolicy) Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value of a given attacker strategy against the average defender strategy

Parameters
  • defender_strategy – the average defender strategy

  • attacker_strategy – the attacker strategy to evaluate

Returns

the average reward

evaluate_defender_policy(defender_strategy: csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value of a given defender policy against the average attacker strategy

Parameters
  • defender_thresholds – the defender strategy to evaluate

  • attacker_strategy – the average attacker strategy

Returns

the average reward

evaluate_strategy_profile(defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value following a given strategy profile

Parameters
  • defender_strategy – the average defender strategy

  • attacker_strategy – the average attacker strategy

Returns

the average reward

static exploitability(attacker_val: float, defender_val: float) float[source]

Computes the exploitability metric given the value of the attacker when following a best response against the current defender strategy and the value of the defender when following a best response against the current attacker strategy.

Parameters
  • attacker_val – the value of the attacker when following a best response against the current defender strategy

  • defender_val – the value of the defender when following a best response against the current attacker strategy

Returns

the exploitability

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

local_dfsp(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, env: csle_common.dao.simulation_config.base_env.BaseEnv, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult[source]

Implements the logic of the local DFSP algorithm

Parameters
  • exp_result – the experiment result

  • seed – the random seed of the experiment

  • env – the environment for the experiment

  • training_job – the training job for the experiment

  • random_seeds – the random seeds for the experiment

Returns

None

static round_vec(vec) List[float][source]

Rounds a vector to 3 decimals

Parameters

vec – the vector to round

Returns

the rounded vector

static running_average(x: List[float], N: int) List[float][source]

Calculates the running average of the last N elements of vector x

Parameters
  • x – the vector

  • N – the number of elements to use for average calculation

Returns

the running average vector

train() csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Performs the policy training for the given random seeds using the local DFSP algorithm

Returns

the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters
  • metrics – the dict with the aggregated metrics

  • info – the new information

Returns

the updated dict

csle_agents.agents.dfsp_local.dfsp_local_agent.reduce_R(R, strategy)[source]

Reduces the reward tensor based on a given strategy

Parameters
  • R – the reward tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced reward tensor

csle_agents.agents.dfsp_local.dfsp_local_agent.reduce_T(T, strategy)[source]

Reduces the transition tensor based on a given strategy

Parameters
  • T – the tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced transition tensor

csle_agents.agents.dfsp_local.dfsp_local_ppo_agent module

class csle_agents.agents.dfsp_local.dfsp_local_ppo_agent.DFSPLocalPPOAgent(defender_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, attacker_simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

RL Agent implementing the local DFSP algorithm

attacker_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float][source]

Learns a best response strategy for the attacker against a given defender strategy

Parameters
  • seed – the random seed

  • defender_strategy – the defender strategy

  • attacker_strategy – the attacker strategy

Returns

the learned best response strategy and the average return

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters

metrics – the dict with the aggregated metrics

Returns

the average metrics

defender_best_response(seed: int, defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Tuple[csle_common.dao.training.ppo_policy.PPOPolicy, float][source]

Learns a best response for the defender against a given attacker strategy

Parameters
  • seed – the random seed

  • defender_strategy – the defender strategy

  • attacker_strategy – the attacker strategy

Returns

the learned best response strategy and the average return

evaluate_attacker_policy(defender_strategy: csle_common.dao.training.policy.Policy, attacker_strategy: csle_common.dao.training.policy.Policy) Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value of a given attacker strategy against the average defender strategy

Parameters
  • defender_strategy – the average defender strategy

  • attacker_strategy – the attacker strategy to evaluate

Returns

the average reward

evaluate_defender_policy(defender_strategy: csle_common.dao.training.policy.Policy, attacker_strategy: csle_common.dao.training.policy.Policy) Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value of a given defender policy against the average attacker strategy

Parameters
  • defender_thresholds – the defender strategy to evaluate

  • attacker_strategy – the average attacker strategy

Returns

the average reward

evaluate_strategy_profile(defender_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy, attacker_strategy: csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy) Dict[str, Union[float, int]][source]

Monte-Carlo evaluation of the game value following a given strategy profile

Parameters
  • defender_strategy – the average defender strategy

  • attacker_strategy – the average attacker strategy

Returns

the average reward

static exploitability(attacker_val: float, defender_val: float) float[source]

Computes the exploitability metric given the value of the attacker when following a best response against the current defender strategy and the value of the defender when following a best response against the current attacker strategy.

Parameters
  • attacker_val – the value of the attacker when following a best response against the current defender strategy

  • defender_val – the value of the defender when following a best response against the current attacker strategy

Returns

the exploitability

get_attacker_experiment_config() csle_common.dao.training.experiment_config.ExperimentConfig[source]
Returns

the experiment configuration for learning a best response of the attacker

get_defender_experiment_config() csle_common.dao.training.experiment_config.ExperimentConfig[source]
Returns

the experiment configuration for learning a best response of the defender

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

local_dfsp(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, env: csle_common.dao.simulation_config.base_env.BaseEnv, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult[source]

Implements the local DFSP training logic

Parameters
  • exp_result – the experiment result

  • seed – the seed for the experiments

  • env – the environment for the experiment

  • training_job – the training job

  • random_seeds – the random seeds for the experiment

Returns

None

static round_vec(vec) List[float][source]

Rounds a vector to 3 decimals

Parameters

vec – the vector to round

Returns

the rounded vector

static running_average(x: List[float], N: int) List[float][source]

Calculates the running average of the last N elements of vector x

Parameters
  • x – the vector

  • N – the number of elements to use for average calculation

Returns

the running average vector

train() csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Performs the policy training for the given random seeds using the local DFSP algorithm

Returns

the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters
  • metrics – the dict with the aggregated metrics

  • info – the new information

Returns

the updated dict

Module contents