csle_agents.agents.t_spsa package
Submodules
csle_agents.agents.t_spsa.t_spsa_agent module
- class csle_agents.agents.t_spsa.t_spsa_agent.TSPSAAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, env: Optional[csle_common.dao.simulation_config.base_env.BaseEnv] = None, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True)[source]
Bases:
csle_agents.agents.base.base_agent.BaseAgent
RL Agent implementing the T-SPSA algorithm from (Hammar, Stadler 2021 - Intrusion Prevention through Optimal Stopping))
- batch_gradient(theta: List[float], ck: float, L: int, k: int, gradient_batch_size: int = 1)[source]
Computes a batch of gradients and returns the average
- Parameters
theta – the current parameter vector
k – the current training iteration
ck – the perturbation step size
L – the total number of stops for the defender
gradient_batch_size – the number of gradients to include in the batch
- Returns
the average of the batch of gradients
- static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]] [source]
Computes the average metrics of a dict with aggregated metrics
- Parameters
metrics – the dict with the aggregated metrics
- Returns
the average metrics
- estimate_gk(theta: List[float], deltak: List[float], ck: float, L: int)[source]
Estimate the gradient at iteration k of the T-SPSA algorithm
- Parameters
theta – the current parameter vector
deltak – the perturbation direction vector
ck – the perturbation step size
L – the total number of stops for the defender
- Returns
the estimated gradient
- eval_theta(policy: Union[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy, csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy], max_steps: int = 200) Dict[str, Any] [source]
Evaluates a given threshold policy by running monte-carlo simulations
- Parameters
policy – the policy to evaluate
- Returns
the average metrics of the evaluation
- get_policy(theta: List[float], L: int) Union[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy, csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy] [source]
Gets the policy from a parameter vector
- Parameters
theta – the parameter vector
L – the number of parameters
- Returns
the policy object
- static initial_theta(L: int) numpy.ndarray[Any, numpy.dtype[Any]] [source]
Initializes theta randomly
- Parameters
L – the dimension of theta
- Returns
the initialized theta vector
- static round_vec(vec) List[float] [source]
Rounds a vector to 3 decimals
- Parameters
vec – the vector to round
- Returns
the rounded vector
- spsa(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult [source]
Runs the SPSA algorithm
- Parameters
exp_result – the experiment result object to store the result
seed – the seed
training_job – the training job config
random_seeds – list of seeds
- Returns
the updated experiment result and the trained policy
- static standard_ak(a: int, A: int, epsilon: float, k: int) float [source]
Gets the step size for gradient ascent at iteration k
- Parameters
a – a scalar hyperparameter
A – a scalar hyperparameter
epsilon – the epsilon scalar hyperparameter
k – the iteration index
- Returns
the step size a_k
- static standard_ck(c: float, lamb: float, k: int) float [source]
Gets the step size of perturbations at iteration k
- Parameters
c – a scalar hyperparameter
lamb – (lambda) a scalar hyperparameter
k – the iteration
- Returns
the pertrubation step size
- static standard_deltak(dimension: int, k: int) List[float] [source]
Gets the perturbation direction at iteration k
- Parameters
k – the iteration
dimension – the dimension of the perturbation vector
- Returns
delta_k the perturbation vector at iteration k
- train() csle_common.dao.training.experiment_execution.ExperimentExecution [source]
Performs the policy training for the given random seeds using T-SPSA
- Returns
the training metrics and the trained policies
- static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]] [source]
Update a dict with aggregated metrics using new information from the environment
- Parameters
metrics – the dict with the aggregated metrics
info – the new information
- Returns
the updated dict