csle_agents.agents.t_spsa package

Submodules

csle_agents.agents.t_spsa.t_spsa_agent module

class csle_agents.agents.t_spsa.t_spsa_agent.TSPSAAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, env: Optional[csle_common.dao.simulation_config.base_env.BaseEnv] = None, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

RL Agent implementing the T-SPSA algorithm from (Hammar, Stadler 2021 - Intrusion Prevention through Optimal Stopping))

batch_gradient(theta: List[float], ck: float, L: int, k: int, gradient_batch_size: int = 1)[source]

Computes a batch of gradients and returns the average

Parameters
  • theta – the current parameter vector

  • k – the current training iteration

  • ck – the perturbation step size

  • L – the total number of stops for the defender

  • gradient_batch_size – the number of gradients to include in the batch

Returns

the average of the batch of gradients

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters

metrics – the dict with the aggregated metrics

Returns

the average metrics

estimate_gk(theta: List[float], deltak: List[float], ck: float, L: int)[source]

Estimate the gradient at iteration k of the T-SPSA algorithm

Parameters
  • theta – the current parameter vector

  • deltak – the perturbation direction vector

  • ck – the perturbation step size

  • L – the total number of stops for the defender

Returns

the estimated gradient

eval_theta(policy: Union[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy, csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy], max_steps: int = 200) Dict[str, Any][source]

Evaluates a given threshold policy by running monte-carlo simulations

Parameters

policy – the policy to evaluate

Returns

the average metrics of the evaluation

get_policy(theta: List[float], L: int) Union[csle_common.dao.training.multi_threshold_stopping_policy.MultiThresholdStoppingPolicy, csle_common.dao.training.linear_threshold_stopping_policy.LinearThresholdStoppingPolicy][source]

Gets the policy from a parameter vector

Parameters
  • theta – the parameter vector

  • L – the number of parameters

Returns

the policy object

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

static initial_theta(L: int) numpy.ndarray[Any, numpy.dtype[Any]][source]

Initializes theta randomly

Parameters

L – the dimension of theta

Returns

the initialized theta vector

static round_vec(vec) List[float][source]

Rounds a vector to 3 decimals

Parameters

vec – the vector to round

Returns

the rounded vector

spsa(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult[source]

Runs the SPSA algorithm

Parameters
  • exp_result – the experiment result object to store the result

  • seed – the seed

  • training_job – the training job config

  • random_seeds – list of seeds

Returns

the updated experiment result and the trained policy

static standard_ak(a: int, A: int, epsilon: float, k: int) float[source]

Gets the step size for gradient ascent at iteration k

Parameters
  • a – a scalar hyperparameter

  • A – a scalar hyperparameter

  • epsilon – the epsilon scalar hyperparameter

  • k – the iteration index

Returns

the step size a_k

static standard_ck(c: float, lamb: float, k: int) float[source]

Gets the step size of perturbations at iteration k

Parameters
  • c – a scalar hyperparameter

  • lamb – (lambda) a scalar hyperparameter

  • k – the iteration

Returns

the pertrubation step size

static standard_deltak(dimension: int, k: int) List[float][source]

Gets the perturbation direction at iteration k

Parameters
  • k – the iteration

  • dimension – the dimension of the perturbation vector

Returns

delta_k the perturbation vector at iteration k

train() csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Performs the policy training for the given random seeds using T-SPSA

Returns

the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters
  • metrics – the dict with the aggregated metrics

  • info – the new information

Returns

the updated dict

Module contents