csle_common.dao.training package

Submodules

csle_common.dao.training.agent_type module

class csle_common.dao.training.agent_type.AgentType(value)[source]

Bases: enum.IntEnum

Enum representing the different agent types in CSLE

BAYESIAN_OPTIMIZATION = 23
BAYESIAN_OPTIMIZATION_EMUKIT = 26
C51_CLEAN = 34
CMA_ES = 28
CROSS_ENTROPY = 13
DFSP_LOCAL = 24
DIFFERENTIAL_EVOLUTION = 12
DQN = 3
DQN_CLEAN = 33
DYNA_SEC = 22
FICTITIOUS_PLAY = 20
HSVI = 9
HSVI_OS_POSG = 19
KIEFER_WOLFOWITZ = 14
LINEAR_PROGRAMMING_CMDP = 25
LINEAR_PROGRAMMING_NORMAL_FORM = 21
MCS = 36
NELDER_MEAD = 29
NFSP = 5
NONE = 7
PARTICLE_SWARM = 30
POLICY_ITERATION = 17
POMCP = 32
PPG_CLEAN = 35
PPO = 1
PPO_CLEAN = 31
Q_LEARNING = 15
RANDOM = 6
REINFORCE = 4
SARSA = 16
SHAPLEY_ITERATION = 18
SIMULATED_ANNEALING = 27
SONDIK_VALUE_ITERATION = 10
T_FP = 2
T_SPSA = 0
VALUE_ITERATION = 8

csle_common.dao.training.alpha_vectors_policy module

class csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], alpha_vectors: List[Any], transition_tensor: List[Any], reward_tensor: List[Any], states: List[csle_common.dao.simulation_config.state.State], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a policy based on alpha vectors for a POMDP (Sondik 1971)

action(o: List[Union[int, float]], deterministic: bool = True) int[source]

Selects the next action

Parameters
  • o – the belief

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy[source]

Converts a dict representation to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[Union[int, float]], a: int) float[source]

Calculates the probability of taking a given action for a given observation

Parameters
  • o – the input observation

  • a – the action

Returns

p(a|o)

stage_policy(o: Union[List[Union[int, float]], int, float]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

A dict representation of the function

csle_common.dao.training.dqn_policy module

class csle_common.dao.training.dqn_policy.DQNPolicy(model: Union[None, stable_baselines3.dqn.dqn.DQN, csle_common.models.q_network.QNetwork], simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A neural network policy learned with DQN

action(o: List[float], deterministic: bool = True) numpy.ndarray[Any, numpy.dtype[Any]][source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.dqn_policy.DQNPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.dqn_policy.DQNPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.dqn_policy.DQNPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[float], a) int[source]

Multi-threshold stopping policy

Parameters

o – the current observation

Returns

the selected action

stage_policy(o: Union[List[int], List[float]]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.experiment_config module

class csle_common.dao.training.experiment_config.ExperimentConfig(output_dir: str, title: str, random_seeds: List[int], agent_type: csle_common.dao.training.agent_type.AgentType, hparams: Dict[str, csle_common.dao.training.hparam.HParam], log_every: int, player_type: csle_common.dao.training.player_type.PlayerType, player_idx: int, br_log_every: int = 10)[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO representing the configuration of an experiment

static from_dict(d: Dict[str, Any]) csle_common.dao.training.experiment_config.ExperimentConfig[source]

Converts a dict representation to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.experiment_config.ExperimentConfig[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

to_dict() Dict[str, Any][source]

Converts the object to a dict representation

Returns

a dict representation of the object

csle_common.dao.training.experiment_execution module

class csle_common.dao.training.experiment_execution.ExperimentExecution(config: csle_common.dao.training.experiment_config.ExperimentConfig, result: csle_common.dao.training.experiment_result.ExperimentResult, timestamp: float, emulation_name: str, simulation_name: str, descr: str, log_file_path: str)[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO representing an experiment execution

static from_dict(d: Dict[str, Any]) csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Converts a dict representation of the object

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

to_dict() Dict[str, Any][source]

Converts the object to a dict representation

Returns

a dict representation of the object

csle_common.dao.training.experiment_result module

class csle_common.dao.training.experiment_result.ExperimentResult[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO representing the results of an experiment

static from_dict(d: Dict[str, Any]) csle_common.dao.training.experiment_result.ExperimentResult[source]

Converts a dict representation to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.experiment_result.ExperimentResult[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

to_dict() Dict[str, Any][source]

Converts the object to a dict representation

Returns

a dict representation of the object

csle_common.dao.training.fnn_with_softmax_policy module

class csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy(policy_network, simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float, input_dim: int, output_dim: int)[source]

Bases: csle_common.dao.training.policy.Policy

A feed-forward neural network policy with softmax output

action(o: numpy.ndarray[Any, numpy.dtype[Any]], deterministic: bool = True) Any[source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

get_action_and_log_prob(state: numpy.ndarray[Any, numpy.dtype[Any]]) Tuple[int, float][source]

Samples an action from the policy network

Parameters
  • policy_network – the policy network

  • state – the state to sample an action for

Returns

The sampled action id and the log probability

probability(o: numpy.ndarray[Any, numpy.dtype[Any]], a: int) Union[int, float][source]

Multi-threshold stopping policy

Parameters

o – the current observation

Returns

the selected action

save_policy_network() None[source]

Saves the PyTorch Model Weights

Returns

None

stage_policy(o: Union[List[int], List[float]]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.hparam module

class csle_common.dao.training.hparam.HParam(value: Union[int, float, str, List[Any]], name: str, descr: str)[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO class representing a hyperparameter

static from_dict(d: Dict[str, Any]) csle_common.dao.training.hparam.HParam[source]

Creates an instance from a dict representation

Parameters

d – the dict reppresentation

Returns

the instance

static from_json_file(json_file_path: str) csle_common.dao.training.hparam.HParam[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

to_dict() Dict[str, Any][source]

Converts the object to a dict representation

Returns

a dict representation of the object

csle_common.dao.training.linear_tabular_policy module

class csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy(stopping_policy: csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy, action_policy: csle_common.dao.training.tabular_policy.TabularPolicy, simulation_name: str, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType)[source]

Bases: csle_common.dao.training.policy.Policy

A linear tabular policy that uses a linear threshold line to decide when to take action and a tabular policy to decide which action to take

action(o: List[float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[float], a: int) int[source]

Probability of a given action

Parameters
  • o – the current observation

  • a – a given action

Returns

the probability of a

stage_policy(o: Any) Any[source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, List[float]][source]
Returns

a dict representation of the policy

csle_common.dao.training.linear_threshold_stopping_policy module

class csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy(theta, simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy] = None)[source]

Bases: csle_common.dao.training.policy.Policy

A linear threshold stopping policy

action(o: List[float], deterministic: bool = True) int[source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[float], a: int) float[source]

Probability of a given action

Parameters
  • o – the current observation

  • a – a given action

Returns

the probability of a

stage_policy(o: Union[List[Union[int, float]], int, float]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.mixed_linear_tabular module

class csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy(simulation_name: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A mixed policy using an ensemble of linear tabulat policies

action(o: List[float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[float], a: int) int[source]

Probability of a given action

Parameters
  • o – the current observation

  • a – a given action

Returns

the probability of a

stage_policy(o: Union[List[Union[int, float]], int, float]) List[List[float]][source]

Returns the stage policy for a given observation

Parameters

o – the observation to return the stage policy for

Returns

the stage policy

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.mixed_multi_threshold_stopping_policy module

class csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy(defender_Theta: List[List[List[float]]], attacker_Theta: List[List[List[List[float]]]], simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy] = None)[source]

Bases: csle_common.dao.training.policy.Policy

A mixed multi-threshold stopping policy

action(o: List[float], deterministic: bool = True) int[source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[float], a: int) int[source]

Probability of a given action

Parameters
  • o – the current observation

  • a – a given action

Returns

the probability of a

stage_policy(o: List[Union[int, float]]) List[List[float]][source]

Returns the stage policy for a given observation

Parameters

o – the observation to return the stage policy for

Returns

the stage policy

stop_distributions() Dict[str, List[float]][source]
Returns

the stop distributions and their names

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.mixed_ppo_policy module

class csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy(simulation_name: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A mixed policy using an ensemble of neural network policies learned through PPO

action(o: List[float], deterministic: bool = True) Union[int, float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: List[float], a: int) float[source]

Probability of a given action

Parameters
  • o – the current observation

  • a – a given action

Returns

the probability of a

stage_policy(o: Union[List[int], List[float]]) List[List[float]][source]

Returns the stage policy for a given observation

Parameters

o – the observation to return the stage policy for

Returns

the stage policy

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.multi_threshold_stopping_policy module

class csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy(theta: List[float], simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy] = None)[source]

Bases: csle_common.dao.training.policy.Policy

A multi-threshold stopping policy

action(o: List[float], deterministic: bool = True) int[source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy[source]

Convert a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

static inverse_sigmoid(y) float[source]

The inverse sigmoid function

Parameters

y – sigmoid(x)

Returns

sigmoid(x)^(-1)

probability(o: List[float], a: int) float[source]

Probability of a given action

Parameters
  • o – the current observation

  • a – a given action

Returns

the probability of a

static sigmoid(x) float[source]

The sigmoid function

Parameters

x – the input

Returns

sigmoid(x)

static smooth_threshold_action_selection(threshold: float, b1: float, threshold_action: int = 1, alternative_action: int = 1, k=- 20) Tuple[int, float][source]

Selects the next action according to a smooth threshold function on the belief

Parameters
  • threshold – the threshold

  • b1 – the belief

  • threshold_action – the action to select if the threshold is exceeded

  • alternative_action – the alternative action to select if the threshold is not exceeded

Returns

the selected action and the probability

stage_policy(o: Union[List[int], List[float]]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

stop_distributions() Dict[str, List[float]][source]
Returns

the stop distributions and their names

static stopping_probability(b1, threshold, k=- 20) float[source]

Returns the probability of stopping given a belief and a threshold

Parameters
  • b1 – the belief

  • threshold – the threshold

Returns

the stopping probability

thresholds() List[float][source]
Returns

the thresholds

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

csle_common.dao.training.player_type module

class csle_common.dao.training.player_type.PlayerType(value)[source]

Bases: enum.IntEnum

Enum representing the different player types in CSLE

ATTACKER = 2
DEFENDER = 1
SELF_PLAY = 3

csle_common.dao.training.policy module

class csle_common.dao.training.policy.Policy(agent_type: csle_common.dao.training.agent_type.AgentType, player_type: csle_common.dao.training.player_type.PlayerType)[source]

Bases: csle_base.json_serializable.JSONSerializable

An abstract class representing a policy

abstract action(o: Any, deterministic: bool) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Calculates the next action

Parameters
  • o – the input observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the action

abstract copy() csle_common.dao.training.policy.Policy[source]
Returns

a copy of the object

abstract static from_dict(d: Dict[str, Any]) csle_common.dao.training.policy.Policy[source]

Converts a dict representation of the object to an instance

Parameters

d – the dict representation to convert

Returns

the converted object

abstract static from_json_file(json_file_path: str) csle_common.dao.training.policy.Policy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

abstract probability(o: Any, a: int) Union[int, float][source]

Calculates the probability of a given action for a given observation

Parameters
  • o – the observation

  • a – the action

Returns

the probability

abstract stage_policy(o: Any) Union[List[List[float]], List[float]][source]

Returns a stage policy (see Horak & Bosansky 2019)

Parameters

o – the observation for the stage

Returns

the stage policy

abstract to_dict() Dict[str, Any][source]

Converts the object to a dict representation

Returns

a dict representation of the object

csle_common.dao.training.policy_type module

class csle_common.dao.training.policy_type.PolicyType(value)[source]

Bases: enum.IntEnum

Enum representing the different policy types in CSLE

ALPHA_VECTORS = 1
C51 = 13
DQN = 2
FNN_W_SOFTMAX = 3
LINEAR_TABULAR = 11
LINEAR_THRESHOLD = 9
MIXED_LINEAR_TABULAR = 12
MIXED_MULTI_THRESHOLD = 4
MIXED_PPO_POLICY = 10
MULTI_THRESHOLD = 8
PPO = 5
RANDOM = 6
TABULAR = 0
VECTOR = 7

csle_common.dao.training.ppo_policy module

class csle_common.dao.training.ppo_policy.PPOPolicy(model: Union[None, stable_baselines3.ppo.ppo.PPO, csle_common.models.ppo_network.PPONetwork], simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A neural network policy learned with PPO

action(o: Union[List[float], List[int]], deterministic: bool = True) Union[int, float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() csle_common.dao.training.ppo_policy.PPOPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.ppo_policy.PPOPolicy[source]

Converst a dict representation of the object to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.ppo_policy.PPOPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

load() None[source]

Attempts to load the policy from disk

Returns

None

probability(o: Union[List[float], List[int]], a: int) float[source]

Multi-threshold stopping policy

Parameters
  • o – the current observation

  • o – the action

Returns

the probability of the action

stage_policy(o: Union[List[int], List[float]]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

a dict representation of the policy

value(o: Union[List[float], List[int]]) float[source]

Gets the value of a given observation, computed by the critic network

Parameters

o – the observation to get the value of

Returns

V(o)

csle_common.dao.training.random_policy module

class csle_common.dao.training.random_policy.RandomPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], stage_policy_tensor: Optional[List[List[float]]])[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a static policy

action(o: Union[List[Union[int, float]], int, float], deterministic: bool = True) int[source]

Selects the next action

Parameters
  • o – the input observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() csle_common.dao.training.random_policy.RandomPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.random_policy.RandomPolicy[source]

Converts a dict representation to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.random_policy.RandomPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: Union[List[Union[int, float]], int, float], a: int) float[source]

Calculates the probability of taking a given action for a given observation

Parameters
  • o – the input observation

  • a – the action

Returns

p(a|o)

stage_policy(o: Any) Union[List[List[float]], List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

A dict representation of the function

csle_common.dao.training.tabular_policy module

class csle_common.dao.training.tabular_policy.TabularPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], lookup_table: List[List[float]], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float, value_function: Optional[List[Any]] = None, q_table: Optional[List[Any]] = None)[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a tabular policy

action(o: Union[int, float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Selects the next action

Parameters
  • o – the input observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() csle_common.dao.training.tabular_policy.TabularPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.tabular_policy.TabularPolicy[source]

Converts a dict representation to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.tabular_policy.TabularPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: Union[int, float], a: int) float[source]

Calculates the probability of taking a given action for a given observation

Parameters
  • o – the input observation

  • a – the action

Returns

p(a|o)

stage_policy(o: Union[List[Union[int, float]], int, float]) List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]
Returns

A dict representation of the function

csle_common.dao.training.vector_policy module

class csle_common.dao.training.vector_policy.VectorPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[int], policy_vector: List[float], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a tabular policy

action(o: Union[List[Union[int, float]], int, float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Selects the next action

Parameters
  • o – the input observation

  • deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() csle_common.dao.training.vector_policy.VectorPolicy[source]
Returns

a copy of the DTO

static from_dict(d: Dict[str, Any]) csle_common.dao.training.vector_policy.VectorPolicy[source]

Converts a dict representation to an instance

Parameters

d – the dict to convert

Returns

the created instance

static from_json_file(json_file_path: str) csle_common.dao.training.vector_policy.VectorPolicy[source]

Reads a json file and converts it to a DTO

Parameters

json_file_path – the json file path

Returns

the converted DTO

probability(o: Union[List[Union[int, float]], int, float], a: int) float[source]

Calculates the probability of taking a given action for a given observation

Parameters
  • o – the input observation

  • a – the action

Returns

p(a|o)

stage_policy(o: Union[List[Union[int, float]], int, float]) List[float][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters

o – the latest observation

Returns

the |S|x|A| stage policy

to_dict() Dict[str, Any][source]

Gets a dict representation of the object

Returns

A dict representation of the object

Module contents