csle_common.dao.training package
Submodules
csle_common.dao.training.agent_type module
- class csle_common.dao.training.agent_type.AgentType(value)[source]
Bases:
enum.IntEnum
Enum representing the different agent types in CSLE
- BAYESIAN_OPTIMIZATION = 23
- BAYESIAN_OPTIMIZATION_EMUKIT = 26
- C51_CLEAN = 34
- CMA_ES = 28
- CROSS_ENTROPY = 13
- DFSP_LOCAL = 24
- DIFFERENTIAL_EVOLUTION = 12
- DQN = 3
- DQN_CLEAN = 33
- DYNA_SEC = 22
- FICTITIOUS_PLAY = 20
- HSVI = 9
- HSVI_OS_POSG = 19
- KIEFER_WOLFOWITZ = 14
- LINEAR_PROGRAMMING_CMDP = 25
- LINEAR_PROGRAMMING_NORMAL_FORM = 21
- MCS = 36
- NELDER_MEAD = 29
- NFSP = 5
- NONE = 7
- PARTICLE_SWARM = 30
- POLICY_ITERATION = 17
- POMCP = 32
- PPG_CLEAN = 35
- PPO = 1
- PPO_CLEAN = 31
- Q_LEARNING = 15
- RANDOM = 6
- RANDOM_SEARCH = 11
- REINFORCE = 4
- SARSA = 16
- SHAPLEY_ITERATION = 18
- SIMULATED_ANNEALING = 27
- SONDIK_VALUE_ITERATION = 10
- T_FP = 2
- T_SPSA = 0
- VALUE_ITERATION = 8
csle_common.dao.training.alpha_vectors_policy module
- class csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], alpha_vectors: List[Any], transition_tensor: List[Any], reward_tensor: List[Any], states: List[csle_common.dao.simulation_config.state.State], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float)[source]
Bases:
csle_common.dao.training.policy.Policy
Object representing a policy based on alpha vectors for a POMDP (Sondik 1971)
- action(o: List[Union[int, float]], deterministic: bool = True) int [source]
Selects the next action
- Parameters
o – the belief
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the next action and its probability
- copy() csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy [source]
Converts a dict representation to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.alpha_vectors_policy.AlphaVectorsPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[Union[int, float]], a: int) float [source]
Calculates the probability of taking a given action for a given observation
- Parameters
o – the input observation
a – the action
- Returns
p(a|o)
csle_common.dao.training.dqn_policy module
- class csle_common.dao.training.dqn_policy.DQNPolicy(model: Union[None, stable_baselines3.dqn.dqn.DQN, csle_common.models.q_network.QNetwork], simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]
Bases:
csle_common.dao.training.policy.Policy
A neural network policy learned with DQN
- action(o: List[float], deterministic: bool = True) numpy.ndarray[Any, numpy.dtype[Any]] [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.dqn_policy.DQNPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.dqn_policy.DQNPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.dqn_policy.DQNPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[float], a) int [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
- Returns
the selected action
csle_common.dao.training.experiment_config module
- class csle_common.dao.training.experiment_config.ExperimentConfig(output_dir: str, title: str, random_seeds: List[int], agent_type: csle_common.dao.training.agent_type.AgentType, hparams: Dict[str, csle_common.dao.training.hparam.HParam], log_every: int, player_type: csle_common.dao.training.player_type.PlayerType, player_idx: int, br_log_every: int = 10)[source]
Bases:
csle_base.json_serializable.JSONSerializable
DTO representing the configuration of an experiment
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.experiment_config.ExperimentConfig [source]
Converts a dict representation to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.experiment_config.ExperimentConfig [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
csle_common.dao.training.experiment_execution module
- class csle_common.dao.training.experiment_execution.ExperimentExecution(config: csle_common.dao.training.experiment_config.ExperimentConfig, result: csle_common.dao.training.experiment_result.ExperimentResult, timestamp: float, emulation_name: str, simulation_name: str, descr: str, log_file_path: str)[source]
Bases:
csle_base.json_serializable.JSONSerializable
DTO representing an experiment execution
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.experiment_execution.ExperimentExecution [source]
Converts a dict representation of the object
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.experiment_execution.ExperimentExecution [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
csle_common.dao.training.experiment_result module
- class csle_common.dao.training.experiment_result.ExperimentResult[source]
Bases:
csle_base.json_serializable.JSONSerializable
DTO representing the results of an experiment
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.experiment_result.ExperimentResult [source]
Converts a dict representation to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.experiment_result.ExperimentResult [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
csle_common.dao.training.fnn_with_softmax_policy module
- class csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy(policy_network, simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float, input_dim: int, output_dim: int)[source]
Bases:
csle_common.dao.training.policy.Policy
A feed-forward neural network policy with softmax output
- action(o: numpy.ndarray[Any, numpy.dtype[Any]], deterministic: bool = True) Any [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.fnn_with_softmax_policy.FNNWithSoftmaxPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- get_action_and_log_prob(state: numpy.ndarray[Any, numpy.dtype[Any]]) Tuple[int, float] [source]
Samples an action from the policy network
- Parameters
policy_network – the policy network
state – the state to sample an action for
- Returns
The sampled action id and the log probability
- probability(o: numpy.ndarray[Any, numpy.dtype[Any]], a: int) Union[int, float] [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
- Returns
the selected action
csle_common.dao.training.hparam module
- class csle_common.dao.training.hparam.HParam(value: Union[int, float, str, List[Any]], name: str, descr: str)[source]
Bases:
csle_base.json_serializable.JSONSerializable
DTO class representing a hyperparameter
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.hparam.HParam [source]
Creates an instance from a dict representation
- Parameters
d – the dict reppresentation
- Returns
the instance
- static from_json_file(json_file_path: str) csle_common.dao.training.hparam.HParam [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
csle_common.dao.training.linear_tabular_policy module
- class csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy(stopping_policy: csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy, action_policy: csle_common.dao.training.tabular_policy.TabularPolicy, simulation_name: str, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType)[source]
Bases:
csle_common.dao.training.policy.Policy
A linear tabular policy that uses a linear threshold line to decide when to take action and a tabular policy to decide which action to take
- action(o: List[float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.linear_tabular_policy.LinearTabularPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[float], a: int) int [source]
Probability of a given action
- Parameters
o – the current observation
a – a given action
- Returns
the probability of a
csle_common.dao.training.linear_threshold_stopping_policy module
- class csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy(theta, simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy] = None)[source]
Bases:
csle_common.dao.training.policy.Policy
A linear threshold stopping policy
- action(o: List[float], deterministic: bool = True) int [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[float], a: int) float [source]
Probability of a given action
- Parameters
o – the current observation
a – a given action
- Returns
the probability of a
csle_common.dao.training.mixed_linear_tabular module
- class csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy(simulation_name: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]
Bases:
csle_common.dao.training.policy.Policy
A mixed policy using an ensemble of linear tabulat policies
- action(o: List[float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.mixed_linear_tabular.MixedLinearTabularPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[float], a: int) int [source]
Probability of a given action
- Parameters
o – the current observation
a – a given action
- Returns
the probability of a
csle_common.dao.training.mixed_multi_threshold_stopping_policy module
- class csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy(defender_Theta: List[List[List[float]]], attacker_Theta: List[List[List[List[float]]]], simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy] = None)[source]
Bases:
csle_common.dao.training.policy.Policy
A mixed multi-threshold stopping policy
- action(o: List[float], deterministic: bool = True) int [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.mixed_multi_threshold_stopping_policy.MixedMultiThresholdStoppingPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[float], a: int) int [source]
Probability of a given action
- Parameters
o – the current observation
a – a given action
- Returns
the probability of a
- stage_policy(o: List[Union[int, float]]) List[List[float]] [source]
Returns the stage policy for a given observation
- Parameters
o – the observation to return the stage policy for
- Returns
the stage policy
csle_common.dao.training.mixed_ppo_policy module
- class csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy(simulation_name: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]
Bases:
csle_common.dao.training.policy.Policy
A mixed policy using an ensemble of neural network policies learned through PPO
- action(o: List[float], deterministic: bool = True) Union[int, float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.mixed_ppo_policy.MixedPPOPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: List[float], a: int) float [source]
Probability of a given action
- Parameters
o – the current observation
a – a given action
- Returns
the probability of a
csle_common.dao.training.multi_threshold_stopping_policy module
- class csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy(theta: List[float], simulation_name: str, L: int, states: List[csle_common.dao.simulation_config.state.State], player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: Optional[csle_common.dao.training.experiment_config.ExperimentConfig], avg_R: float, agent_type: csle_common.dao.training.agent_type.AgentType, opponent_strategy: Optional[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy] = None)[source]
Bases:
csle_common.dao.training.policy.Policy
A multi-threshold stopping policy
- action(o: List[float], deterministic: bool = True) int [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy [source]
Convert a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- static inverse_sigmoid(y) float [source]
The inverse sigmoid function
- Parameters
y – sigmoid(x)
- Returns
sigmoid(x)^(-1)
- probability(o: List[float], a: int) float [source]
Probability of a given action
- Parameters
o – the current observation
a – a given action
- Returns
the probability of a
- static smooth_threshold_action_selection(threshold: float, b1: float, threshold_action: int = 1, alternative_action: int = 1, k=- 20) Tuple[int, float] [source]
Selects the next action according to a smooth threshold function on the belief
- Parameters
threshold – the threshold
b1 – the belief
threshold_action – the action to select if the threshold is exceeded
alternative_action – the alternative action to select if the threshold is not exceeded
- Returns
the selected action and the probability
- stage_policy(o: Union[List[int], List[float]]) List[List[float]] [source]
Gets the stage policy, i.e a |S|x|A| policy
- Parameters
o – the latest observation
- Returns
the |S|x|A| stage policy
- stop_distributions() Dict[str, List[float]] [source]
- Returns
the stop distributions and their names
csle_common.dao.training.player_type module
csle_common.dao.training.policy module
- class csle_common.dao.training.policy.Policy(agent_type: csle_common.dao.training.agent_type.AgentType, player_type: csle_common.dao.training.player_type.PlayerType)[source]
Bases:
csle_base.json_serializable.JSONSerializable
An abstract class representing a policy
- abstract action(o: Any, deterministic: bool) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Calculates the next action
- Parameters
o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the action
- abstract copy() csle_common.dao.training.policy.Policy [source]
- Returns
a copy of the object
- abstract static from_dict(d: Dict[str, Any]) csle_common.dao.training.policy.Policy [source]
Converts a dict representation of the object to an instance
- Parameters
d – the dict representation to convert
- Returns
the converted object
- abstract static from_json_file(json_file_path: str) csle_common.dao.training.policy.Policy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- abstract probability(o: Any, a: int) Union[int, float] [source]
Calculates the probability of a given action for a given observation
- Parameters
o – the observation
a – the action
- Returns
the probability
csle_common.dao.training.policy_type module
- class csle_common.dao.training.policy_type.PolicyType(value)[source]
Bases:
enum.IntEnum
Enum representing the different policy types in CSLE
- ALPHA_VECTORS = 1
- C51 = 13
- DQN = 2
- FNN_W_SOFTMAX = 3
- LINEAR_TABULAR = 11
- LINEAR_THRESHOLD = 9
- MIXED_LINEAR_TABULAR = 12
- MIXED_MULTI_THRESHOLD = 4
- MIXED_PPO_POLICY = 10
- MULTI_THRESHOLD = 8
- PPO = 5
- RANDOM = 6
- TABULAR = 0
- VECTOR = 7
csle_common.dao.training.ppo_policy module
- class csle_common.dao.training.ppo_policy.PPOPolicy(model: Union[None, stable_baselines3.ppo.ppo.PPO, csle_common.models.ppo_network.PPONetwork], simulation_name: str, save_path: str, player_type: csle_common.dao.training.player_type.PlayerType, states: List[csle_common.dao.simulation_config.state.State], actions: List[csle_common.dao.simulation_config.action.Action], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, avg_R: float)[source]
Bases:
csle_common.dao.training.policy.Policy
A neural network policy learned with PPO
- action(o: Union[List[float], List[int]], deterministic: bool = True) Union[int, float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the selected action
- copy() csle_common.dao.training.ppo_policy.PPOPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.ppo_policy.PPOPolicy [source]
Converst a dict representation of the object to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.ppo_policy.PPOPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: Union[List[float], List[int]], a: int) float [source]
Multi-threshold stopping policy
- Parameters
o – the current observation
o – the action
- Returns
the probability of the action
csle_common.dao.training.random_policy module
- class csle_common.dao.training.random_policy.RandomPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], stage_policy_tensor: Optional[List[List[float]]])[source]
Bases:
csle_common.dao.training.policy.Policy
Object representing a static policy
- action(o: Union[List[Union[int, float]], int, float], deterministic: bool = True) int [source]
Selects the next action
- Parameters
o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the next action and its probability
- copy() csle_common.dao.training.random_policy.RandomPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.random_policy.RandomPolicy [source]
Converts a dict representation to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.random_policy.RandomPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: Union[List[Union[int, float]], int, float], a: int) float [source]
Calculates the probability of taking a given action for a given observation
- Parameters
o – the input observation
a – the action
- Returns
p(a|o)
csle_common.dao.training.tabular_policy module
- class csle_common.dao.training.tabular_policy.TabularPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[csle_common.dao.simulation_config.action.Action], lookup_table: List[List[float]], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float, value_function: Optional[List[Any]] = None, q_table: Optional[List[Any]] = None)[source]
Bases:
csle_common.dao.training.policy.Policy
Object representing a tabular policy
- action(o: Union[int, float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Selects the next action
- Parameters
o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the next action and its probability
- copy() csle_common.dao.training.tabular_policy.TabularPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.tabular_policy.TabularPolicy [source]
Converts a dict representation to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.tabular_policy.TabularPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: Union[int, float], a: int) float [source]
Calculates the probability of taking a given action for a given observation
- Parameters
o – the input observation
a – the action
- Returns
p(a|o)
csle_common.dao.training.vector_policy module
- class csle_common.dao.training.vector_policy.VectorPolicy(player_type: csle_common.dao.training.player_type.PlayerType, actions: List[int], policy_vector: List[float], agent_type: csle_common.dao.training.agent_type.AgentType, simulation_name: str, avg_R: float)[source]
Bases:
csle_common.dao.training.policy.Policy
Object representing a tabular policy
- action(o: Union[List[Union[int, float]], int, float], deterministic: bool = True) Union[int, List[int], float, numpy.ndarray[Any, numpy.dtype[Any]]] [source]
Selects the next action
- Parameters
o – the input observation
deterministic – boolean flag indicating whether the action selection should be deterministic
- Returns
the next action and its probability
- copy() csle_common.dao.training.vector_policy.VectorPolicy [source]
- Returns
a copy of the DTO
- static from_dict(d: Dict[str, Any]) csle_common.dao.training.vector_policy.VectorPolicy [source]
Converts a dict representation to an instance
- Parameters
d – the dict to convert
- Returns
the created instance
- static from_json_file(json_file_path: str) csle_common.dao.training.vector_policy.VectorPolicy [source]
Reads a json file and converts it to a DTO
- Parameters
json_file_path – the json file path
- Returns
the converted DTO
- probability(o: Union[List[Union[int, float]], int, float], a: int) float [source]
Calculates the probability of taking a given action for a given observation
- Parameters
o – the input observation
a – the action
- Returns
p(a|o)