csle_common.dao.training package

Submodules

csle_common.dao.training.agent_type module

class csle_common.dao.training.agent_type.AgentType(value)[source]

Bases: enum.IntEnum

Enum representing the different agent types in CSLE

BAYESIAN_OPTIMIZATION = 23

BAYESIAN_OPTIMIZATION_EMUKIT = 26

C51_CLEAN = 34

CMA_ES = 28

CROSS_ENTROPY = 13

DFSP_LOCAL = 24

DIFFERENTIAL_EVOLUTION = 12

DQN = 3

DQN_CLEAN = 33

DYNA_SEC = 22

FICTITIOUS_PLAY = 20

HSVI = 9

HSVI_OS_POSG = 19

KIEFER_WOLFOWITZ = 14

LINEAR_PROGRAMMING_CMDP = 25

LINEAR_PROGRAMMING_NORMAL_FORM = 21

MCS = 36

NELDER_MEAD = 29

NFSP = 5

NONE = 7

PARTICLE_SWARM = 30

POLICY_ITERATION = 17

POMCP = 32

PPG_CLEAN = 35

PPO = 1

PPO_CLEAN = 31

Q_LEARNING = 15

RANDOM = 6

RANDOM_SEARCH = 11

REINFORCE = 4

SARSA = 16

SHAPLEY_ITERATION = 18

SIMULATED_ANNEALING = 27

SONDIK_VALUE_ITERATION = 10

T_FP = 2

T_SPSA = 0

VALUE_ITERATION = 8

csle_common.dao.training.alpha_vectors_policy module

class csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], alpha_vectors: List[Any], transition_tensor: List[Any], reward_tensor: List[Any], states: List[csle_common.dao.simulation_config.state.State], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a policy based on alpha vectors for a POMDP (Sondik 1971)

action(o: List[Union[int, float]], deterministic: bool = True) → int[source]

Selects the next action

Parameters

o – the belief
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() → csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy[source]

Converts a dict representation to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[Union[int, float]], a: int) → float[source]

Calculates the probability of taking a given action for a given observation

Parameters

o – the input observation
a – the action

Returns

p(a|o)

stage_policy(o: Union[List[Union[int, float]], int, float]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: A dict representation of the function

csle_common.dao.training.dqn_policy module

class csle_common.dao.training.dqn_policy.DQNPolicy(model: Union[None, stable_baselines3.dqn.dqn.DQN, csle_common.models.q_network.QNetwork], simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A neural network policy learned with DQN

action(o: List[float], deterministic: bool = True) → numpy.ndarray[Any, numpy.dtype[Any]][source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.dqn_policy.DQNPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.dqn_policy.DQNPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.dqn_policy.DQNPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[float], a) → int[source]

Multi-threshold stopping policy

Parameters: o – the current observation
Returns: the selected action

stage_policy(o: Union[List[int], List[float]]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.experiment_config module

class csle_common.dao.training.experiment_config.ExperimentConfig(output_dir: str, title: str, random_seeds: List[int], agent_type: csle_common.dao.training.agent_type.AgentType, hparams: Dict[str, csle_common.dao.training.hparam.HParam], log_every: int, player_type: csle_common.dao.training.player_type.PlayerType, player_idx: int, br_log_every: int = 10)[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO representing the configuration of an experiment

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.experiment_config.ExperimentConfig[source]

Converts a dict representation to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.experiment_config.ExperimentConfig[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

to_dict() → Dict[str, Any][source]

Converts the object to a dict representation

Returns: a dict representation of the object

csle_common.dao.training.experiment_execution module

class csle_common.dao.training.experiment_execution.ExperimentExecution(config: csle_common.dao.training.experiment_config.ExperimentConfig, result: csle_common.dao.training.experiment_result.ExperimentResult, timestamp: float, emulation_name: str, simulation_name: str, descr: str, log_file_path: str)[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO representing an experiment execution

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Converts a dict representation of the object

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

to_dict() → Dict[str, Any][source]

Converts the object to a dict representation

Returns: a dict representation of the object

csle_common.dao.training.experiment_result module

class csle_common.dao.training.experiment_result.ExperimentResult[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO representing the results of an experiment

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.experiment_result.ExperimentResult[source]

Converts a dict representation to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.experiment_result.ExperimentResult[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

to_dict() → Dict[str, Any][source]

Converts the object to a dict representation

Returns: a dict representation of the object

csle_common.dao.training.fnn_with_softmax_policy module

class csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy(policy_network, simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float, input_dim: int, output_dim: int)[source]

Bases: csle_common.dao.training.policy.Policy

A feed-forward neural network policy with softmax output

action(o: numpy.ndarray[Any, numpy.dtype[Any]], deterministic: bool = True) → Any[source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

get_action_and_log_prob(state: numpy.ndarray[Any, numpy.dtype[Any]]) → Tuple[int, float][source]

Samples an action from the policy network

Parameters

policy_network – the policy network
state – the state to sample an action for

Returns

The sampled action id and the log probability

probability(o: numpy.ndarray[Any, numpy.dtype[Any]], a: int) → Union[int, float][source]

Multi-threshold stopping policy

Parameters: o – the current observation
Returns: the selected action

save_policy_network() → None[source]

Saves the PyTorch Model Weights

Returns: None

stage_policy(o: Union[List[int], List[float]]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.hparam module

class csle_common.dao.training.hparam.HParam(value: Union[int, float, str, List[Any]], name: str, descr: str)[source]

Bases: csle_base.json_serializable.JSONSerializable

DTO class representing a hyperparameter

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.hparam.HParam[source]

Creates an instance from a dict representation

Parameters: d – the dict reppresentation
Returns: the instance

static from_json_file(json_file_path: str) → csle_common.dao.training.hparam.HParam[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

to_dict() → Dict[str, Any][source]

Converts the object to a dict representation

Returns: a dict representation of the object

csle_common.dao.training.linear_tabular_policy module

class csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy(stopping_policy: csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy, action_policy: csle_common.dao.training.tabular_policy.TabularPolicy, simulation_name: str, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType)[source]

Bases: csle_common.dao.training.policy.Policy

A linear tabular policy that uses a linear threshold line to decide when to take action and a tabular policy to decide which action to take

action(o: List[float], deterministic: bool = True) → Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[float], a: int) → int[source]

Probability of a given action

Parameters

o – the current observation
a – a given action

Returns

the probability of a

stage_policy(o: Any) → Any[source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, List[float]][source]

Returns: a dict representation of the policy

csle_common.dao.training.linear_threshold_stopping_policy module

class csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy(theta, simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy] = None)[source]

Bases: csle_common.dao.training.policy.Policy

A linear threshold stopping policy

action(o: List[float], deterministic: bool = True) → int[source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[float], a: int) → float[source]

Probability of a given action

Parameters

o – the current observation
a – a given action

Returns

the probability of a

stage_policy(o: Union[List[Union[int, float]], int, float]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.mixed_linear_tabular module

class csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy(simulation_name: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A mixed policy using an ensemble of linear tabulat policies

action(o: List[float], deterministic: bool = True) → Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[float], a: int) → int[source]

Probability of a given action

Parameters

o – the current observation
a – a given action

Returns

the probability of a

stage_policy(o: Union[List[Union[int, float]], int, float]) → List[List[float]][source]

Returns the stage policy for a given observation

Parameters: o – the observation to return the stage policy for
Returns: the stage policy

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.mixed_multi_threshold_stopping_policy module

class csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy(defender_Theta: List[List[List[float]]], attacker_Theta: List[List[List[List[float]]]], simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy] = None)[source]

Bases: csle_common.dao.training.policy.Policy

A mixed multi-threshold stopping policy

action(o: List[float], deterministic: bool = True) → int[source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[float], a: int) → int[source]

Probability of a given action

Parameters

o – the current observation
a – a given action

Returns

the probability of a

stage_policy(o: List[Union[int, float]]) → List[List[float]][source]

Returns the stage policy for a given observation

Parameters: o – the observation to return the stage policy for
Returns: the stage policy

stop_distributions() → Dict[str, List[float]][source]

Returns: the stop distributions and their names

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.mixed_ppo_policy module

class csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy(simulation_name: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A mixed policy using an ensemble of neural network policies learned through PPO

action(o: List[float], deterministic: bool = True) → Union[int, float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: List[float], a: int) → float[source]

Probability of a given action

Parameters

o – the current observation
a – a given action

Returns

the probability of a

stage_policy(o: Union[List[int], List[float]]) → List[List[float]][source]

Returns the stage policy for a given observation

Parameters: o – the observation to return the stage policy for
Returns: the stage policy

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.multi_threshold_stopping_policy module

class csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy(theta: List[float], simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy] = None)[source]

Bases: csle_common.dao.training.policy.Policy

A multi-threshold stopping policy

action(o: List[float], deterministic: bool = True) → int[source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy[source]

Convert a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

static inverse_sigmoid(y) → float[source]

The inverse sigmoid function

Parameters: y – sigmoid(x)
Returns: sigmoid(x)^(-1)

probability(o: List[float], a: int) → float[source]

Probability of a given action

Parameters

o – the current observation
a – a given action

Returns

the probability of a

static sigmoid(x) → float[source]

The sigmoid function

Parameters: x – the input
Returns: sigmoid(x)

static smooth_threshold_action_selection(threshold: float, b1: float, threshold_action: int = 1, alternative_action: int = 1, k=- 20) → Tuple[int, float][source]

Selects the next action according to a smooth threshold function on the belief

Parameters

threshold – the threshold
b1 – the belief
threshold_action – the action to select if the threshold is exceeded
alternative_action – the alternative action to select if the threshold is not exceeded

Returns

the selected action and the probability

stage_policy(o: Union[List[int], List[float]]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

stop_distributions() → Dict[str, List[float]][source]

Returns: the stop distributions and their names

static stopping_probability(b1, threshold, k=- 20) → float[source]

Returns the probability of stopping given a belief and a threshold

Parameters

b1 – the belief
threshold – the threshold

Returns

the stopping probability

thresholds() → List[float][source]

Returns: the thresholds

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

csle_common.dao.training.player_type module

class csle_common.dao.training.player_type.PlayerType(value)[source]

Bases: enum.IntEnum

Enum representing the different player types in CSLE

ATTACKER = 2

DEFENDER = 1

SELF_PLAY = 3

csle_common.dao.training.policy module

class csle_common.dao.training.policy.Policy(agent_type: csle_common.dao.training.agent_type.AgentType, player_type: csle_common.dao.training.player_type.PlayerType)[source]

Bases: csle_base.json_serializable.JSONSerializable

An abstract class representing a policy

abstract action(o: Any, deterministic: bool) → Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Calculates the next action

Parameters

o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the action

abstract copy() → csle_common.dao.training.policy.Policy[source]

Returns: a copy of the object

abstract static from_dict(d: Dict[str, Any]) → csle_common.dao.training.policy.Policy[source]

Converts a dict representation of the object to an instance

Parameters: d – the dict representation to convert
Returns: the converted object

abstract static from_json_file(json_file_path: str) → csle_common.dao.training.policy.Policy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

abstract probability(o: Any, a: int) → Union[int, float][source]

Calculates the probability of a given action for a given observation

Parameters

o – the observation
a – the action

Returns

the probability

abstract stage_policy(o: Any) → Union[List[List[float]], List[float]][source]

Returns a stage policy (see Horak & Bosansky 2019)

Parameters: o – the observation for the stage
Returns: the stage policy

abstract to_dict() → Dict[str, Any][source]

Converts the object to a dict representation

Returns: a dict representation of the object

csle_common.dao.training.policy_type module

class csle_common.dao.training.policy_type.PolicyType(value)[source]

Bases: enum.IntEnum

Enum representing the different policy types in CSLE

ALPHA_VECTORS = 1

C51 = 13

DQN = 2

FNN_W_SOFTMAX = 3

LINEAR_TABULAR = 11

LINEAR_THRESHOLD = 9

MIXED_LINEAR_TABULAR = 12

MIXED_MULTI_THRESHOLD = 4

MIXED_PPO_POLICY = 10

MULTI_THRESHOLD = 8

PPO = 5

RANDOM = 6

TABULAR = 0

VECTOR = 7

csle_common.dao.training.ppo_policy module

class csle_common.dao.training.ppo_policy.PPOPolicy(model: Union[None, stable_baselines3.ppo.ppo.PPO, csle_common.models.ppo_network.PPONetwork], simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

A neural network policy learned with PPO

action(o: Union[List[float], List[int]], deterministic: bool = True) → Union[int, float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Multi-threshold stopping policy

Parameters

o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the selected action

copy() → csle_common.dao.training.ppo_policy.PPOPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.ppo_policy.PPOPolicy[source]

Converst a dict representation of the object to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.ppo_policy.PPOPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

load() → None[source]

Attempts to load the policy from disk

Returns: None

probability(o: Union[List[float], List[int]], a: int) → float[source]

Multi-threshold stopping policy

Parameters

o – the current observation
o – the action

Returns

the probability of the action

stage_policy(o: Union[List[int], List[float]]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: a dict representation of the policy

value(o: Union[List[float], List[int]]) → float[source]

Gets the value of a given observation, computed by the critic network

Parameters: o – the observation to get the value of
Returns: V(o)

csle_common.dao.training.random_policy module

class csle_common.dao.training.random_policy.RandomPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], stage_policy_tensor: Optional[List[List[float]]])[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a static policy

action(o: Union[List[Union[int, float]], int, float], deterministic: bool = True) → int[source]

Selects the next action

Parameters

o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() → csle_common.dao.training.random_policy.RandomPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.random_policy.RandomPolicy[source]

Converts a dict representation to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.random_policy.RandomPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: Union[List[Union[int, float]], int, float], a: int) → float[source]

Calculates the probability of taking a given action for a given observation

Parameters

o – the input observation
a – the action

Returns

p(a|o)

stage_policy(o: Any) → Union[List[List[float]], List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: A dict representation of the function

csle_common.dao.training.tabular_policy module

class csle_common.dao.training.tabular_policy.TabularPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], lookup_table: List[List[float]], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float, value_function: Optional[List[Any]] = None, q_table: Optional[List[Any]] = None)[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a tabular policy

action(o: Union[int, float], deterministic: bool = True) → Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Selects the next action

Parameters

o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() → csle_common.dao.training.tabular_policy.TabularPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.tabular_policy.TabularPolicy[source]

Converts a dict representation to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.tabular_policy.TabularPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: Union[int, float], a: int) → float[source]

Calculates the probability of taking a given action for a given observation

Parameters

o – the input observation
a – the action

Returns

p(a|o)

stage_policy(o: Union[List[Union[int, float]], int, float]) → List[List[float]][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Returns: A dict representation of the function

csle_common.dao.training.vector_policy module

class csle_common.dao.training.vector_policy.VectorPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[int], policy_vector: List[float], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float)[source]

Bases: csle_common.dao.training.policy.Policy

Object representing a tabular policy

action(o: Union[List[Union[int, float]], int, float], deterministic: bool = True) → Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]][source]

Selects the next action

Parameters

o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic

Returns

the next action and its probability

copy() → csle_common.dao.training.vector_policy.VectorPolicy[source]

Returns: a copy of the DTO

static from_dict(d: Dict[str, Any]) → csle_common.dao.training.vector_policy.VectorPolicy[source]

Converts a dict representation to an instance

Parameters: d – the dict to convert
Returns: the created instance

static from_json_file(json_file_path: str) → csle_common.dao.training.vector_policy.VectorPolicy[source]

Reads a json file and converts it to a DTO

Parameters: json_file_path – the json file path
Returns: the converted DTO

probability(o: Union[List[Union[int, float]], int, float], a: int) → float[source]

Calculates the probability of taking a given action for a given observation

Parameters

o – the input observation
a – the action

Returns

p(a|o)

stage_policy(o: Union[List[Union[int, float]], int, float]) → List[float][source]

Gets the stage policy, i.e a |S|x|A| policy

Parameters: o – the latest observation
Returns: the |S|x|A| stage policy

to_dict() → Dict[str, Any][source]

Gets a dict representation of the object

Returns: A dict representation of the object

csle_common.dao.training package

Submodules

csle_common.dao.training.agent_type module

csle_common.dao.training.alpha_vectors_policy module

csle_common.dao.training.dqn_policy module

csle_common.dao.training.experiment_config module

csle_common.dao.training.experiment_execution module

csle_common.dao.training.experiment_result module

csle_common.dao.training.fnn_with_softmax_policy module

csle_common.dao.training.hparam module

csle_common.dao.training.linear_tabular_policy module

csle_common.dao.training.linear_threshold_stopping_policy module

csle_common.dao.training.mixed_linear_tabular module

csle_common.dao.training.mixed_multi_threshold_stopping_policy module

csle_common.dao.training.mixed_ppo_policy module

csle_common.dao.training.multi_threshold_stopping_policy module

csle_common.dao.training.player_type module

csle_common.dao.training.policy module

csle_common.dao.training.policy_type module

csle_common.dao.training.ppo_policy module

csle_common.dao.training.random_policy module

csle_common.dao.training.tabular_policy module

csle_common.dao.training.vector_policy module

Module contents