csle_agents.agents.shapley_iteration package
Submodules
csle_agents.agents.shapley_iteration.shapley_iteration_agent module
- class csle_agents.agents.shapley_iteration.shapley_iteration_agent.ShapleyIterationAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True)[source]
Bases:
csle_agents.agents.base.base_agent.BaseAgent
Shapley Iteration Agent
- auxillary_game(V: numpy.ndarray[Any, numpy.dtype[Any]], gamma: float, S: numpy.ndarray[Any, numpy.dtype[Any]], s: int, A1: numpy.ndarray[Any, numpy.dtype[Any]], A2: numpy.ndarray[Any, numpy.dtype[Any]], R: numpy.ndarray[Any, numpy.dtype[Any]], T: numpy.ndarray[Any, numpy.dtype[Any]]) numpy.ndarray[Any, numpy.dtype[Any]] [source]
Creates an auxillary matrix game based on the value function V
- Parameters
V – the value function
gamma – the discount factor
S – the set of states
s – the state s
A1 – the set of actions of player 1
A2 – the set of actions of player 2
R – the reward tensor
T – the transition tensor
- Returns
the matrix auxillary game
- compute_matrix_game_value(A: numpy.ndarray[Any, numpy.dtype[Any]], A1: numpy.ndarray[Any, numpy.dtype[Any]], A2: numpy.ndarray[Any, numpy.dtype[Any]], maximizer: bool = True)[source]
- Parameters
A – the matrix game
A1 – the set of actions of player 1
A2 – the set of acitons of player 2
maximizer – a boolean flag indicating whether the maximin or minimax strategy should be computed
- Returns
(val(A), maximin/minimax)
- shapley_iteration(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int) csle_common.dao.training.experiment_result.ExperimentResult [source]
Runs the Shapley iteration algorithm
- Parameters
exp_result – the experiment result object
seed – the random seed
- Returns
the updated experiment result
- si(S: numpy.ndarray[Any, numpy.dtype[Any]], A1: numpy.ndarray[Any, numpy.dtype[Any]], A2: numpy.ndarray[Any, numpy.dtype[Any]], R: numpy.ndarray[Any, numpy.dtype[Any]], T: numpy.ndarray[Any, numpy.dtype[Any]], gamma: float = 1, max_iterations: int = 500, delta_threshold: float = 0.1) Tuple[numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], List[float]] [source]
Shapley Iteration (L. Shapley 1953)
- Parameters
S – the set of states of the SG
A1 – the set of actions of player 1 in the SG
A2 – the set of actions of player 2 in the SG
R – the reward tensor in the SG
T – the transition tensor in the SG
gamma – the discount factor
max_iterations – the maximum number of iterations
delta_threshold – the stopping threshold
- Returns
the value function, the set of maximin strategies for all stage games,
the set of minimax strategies for all stage games, and the stage games themselves