csle_agents.agents.lp_nf package
Submodules
csle_agents.agents.lp_nf.linear_programming_normal_form_game_agent module
- class csle_agents.agents.lp_nf.linear_programming_normal_form_game_agent.LinearProgrammingNormalFormGameAgent(simulation_env_config: SimulationEnvConfig, experiment_config: ExperimentConfig, env: Optional[BaseEnv] = None, emulation_env_config: Union[None, EmulationEnvConfig] = None, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True)[source]
Bases:
BaseAgent
Linear programming agent for normal-form games
- static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]] [source]
Computes the average metrics of a dict with aggregated metrics
- Parameters
metrics – the dict with the aggregated metrics
- Returns
the average metrics
- compute_equilibrium_strategies_in_matrix_game(A: ndarray[Any, dtype[Any]], A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]]) Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], float] [source]
Computes equilibrium strategies in a matrix game
- Parameters
A – the matrix game
A1 – the action set of player 1 (the maximizer)
A2 – the action set of player 2 (the minimizer)
- Returns
the equilibrium strategy profile and the value
- compute_matrix_game_value(A: ndarray[Any, dtype[Any]], A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], maximizer: bool = True) Tuple[Any, ndarray[Any, dtype[Any]]] [source]
Uses LP to compute the value of a a matrix game, also computes the maximin or minimax strategy
- Parameters
A – the matrix game
A1 – the action set of player 1
A2 – the action set of player 2
maximizer – boolean flag whether to compute the maximin strategy or minimax strategy
- Returns
(val(A), maximin or minimax strategy)
- linear_programming_normal_form(exp_result: ExperimentResult, seed: int, training_job: TrainingJobConfig, random_seeds: List[int]) ExperimentResult [source]
Runs the linear programming algorithm for normal-form games
- Parameters
exp_result – the experiment result object to store the result
seed – the seed
training_job – the training job config
random_seeds – list of seeds
- Returns
the updated experiment result and the trained policy
- static round_vec(vec) List[float] [source]
Rounds a vector to 3 decimals
- Parameters
vec – the vector to round
- Returns
the rounded vector
- train() ExperimentExecution [source]
Performs the policy training for the given random seeds using linear programming
- Returns
the training metrics and the trained policies
- static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]] [source]
Update a dict with aggregated metrics using new information from the environment
- Parameters
metrics – the dict with the aggregated metrics
info – the new information
- Returns
the updated dict