csle_agents.agents.kiefer_wolfowitz package

Submodules

csle_agents.agents.kiefer_wolfowitz.kiefer_wolfowitz_agent module

class csle_agents.agents.kiefer_wolfowitz.kiefer_wolfowitz_agent.KieferWolfowitzAgent(simulation_env_config: SimulationEnvConfig, emulation_env_config: Union[None, EmulationEnvConfig], experiment_config: ExperimentConfig, env: Optional[BaseEnv] = None, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True)[source]

Bases: BaseAgent

RL Agent implementing the Kiefer-Wolfowitz SA algorithm from the 50s

batch_gradient(theta: List[float], delta: float, L: int, gradient_batch_size: int = 1)[source]

Computes a batch of gradients and returns the average

Parameters

theta – the current parameter vector
ck – the perturbation step size
L – the total number of stops for the defender
gradient_batch_size – the number of gradients to include in the batch

Returns

the average of the batch of gradients

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) → Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters: metrics – the dict with the aggregated metrics
Returns: the average metrics

estimate_gk(theta: List[float], delta: float, L: int)[source]

Estimate the gradient at iteration k of the Kiefer-Wolfowitz algorithm

Parameters

theta – the current parameter vector
delta – the perturbation size
L – the total number of stops for the defender

Returns

the estimated gradient

eval_theta(policy: Union[MultiThresholdStoppingPolicy, LinearThresholdStoppingPolicy], max_steps: int = 200) → Dict[str, Any][source]

Evaluates a given threshold policy by running monte-carlo simulations

Parameters: policy – the policy to evaluate
Returns: the average metrics of the evaluation

get_policy(theta: List[float], L: int) → Union[MultiThresholdStoppingPolicy, LinearThresholdStoppingPolicy][source]

Utility method for getting the policy of a given parameter vector

Parameters

theta – the parameter vector
L – the number of parameters

Returns

the policy

hparam_names() → List[str][source]

Returns: a list with the hyperparameter names

static initial_theta(L: int) → ndarray[Any, dtype[Any]][source]

Initializes theta randomly

Parameters: L – the dimension of theta
Returns: the initialized theta vector

kiefer_wolfowitz(exp_result: ExperimentResult, seed: int, training_job: TrainingJobConfig, random_seeds: List[int]) → ExperimentResult[source]

Runs the Kiefer-Wolfowitz algorithm

Parameters

exp_result – the experiment result object to store the result
seed – the seed
training_job – the training job config
random_seeds – list of seeds

Returns

the updated experiment result and the trained policy

static round_vec(vec) → List[float][source]

Rounds a vector to 3 decimals

Parameters: vec – the vector to round
Returns: the rounded vector

train() → ExperimentExecution[source]

Performs the policy training for the given random seeds using Kiefer-Wolfowitz

Returns: the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) → Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters

metrics – the dict with the aggregated metrics
info – the new information

Returns

the updated dict

csle_agents.agents.kiefer_wolfowitz package

Submodules

csle_agents.agents.kiefer_wolfowitz.kiefer_wolfowitz_agent module

Module contents