csle_agents.agents.shapley_iteration package
Submodules
csle_agents.agents.shapley_iteration.shapley_iteration_agent module
- class csle_agents.agents.shapley_iteration.shapley_iteration_agent.ShapleyIterationAgent(simulation_env_config: SimulationEnvConfig, experiment_config: ExperimentConfig, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True)[source]
Bases:
BaseAgent
Shapley Iteration Agent
- auxillary_game(V: ndarray[Any, dtype[Any]], gamma: float, S: ndarray[Any, dtype[Any]], s: int, A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], R: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]]) ndarray[Any, dtype[Any]] [source]
Creates an auxillary matrix game based on the value function V
- Parameters
V – the value function
gamma – the discount factor
S – the set of states
s – the state s
A1 – the set of actions of player 1
A2 – the set of actions of player 2
R – the reward tensor
T – the transition tensor
- Returns
the matrix auxillary game
- compute_matrix_game_value(A: ndarray[Any, dtype[Any]], A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], maximizer: bool = True)[source]
- Parameters
A – the matrix game
A1 – the set of actions of player 1
A2 – the set of acitons of player 2
maximizer – a boolean flag indicating whether the maximin or minimax strategy should be computed
- Returns
(val(A), maximin/minimax)
- shapley_iteration(exp_result: ExperimentResult, seed: int) ExperimentResult [source]
Runs the Shapley iteration algorithm
- Parameters
exp_result – the experiment result object
seed – the random seed
- Returns
the updated experiment result
- si(S: ndarray[Any, dtype[Any]], A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], R: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]], gamma: float = 1, max_iterations: int = 500, delta_threshold: float = 0.1) Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], List[float]] [source]
Shapley Iteration (L. Shapley 1953)
- Parameters
S – the set of states of the SG
A1 – the set of actions of player 1 in the SG
A2 – the set of actions of player 2 in the SG
R – the reward tensor in the SG
T – the transition tensor in the SG
gamma – the discount factor
max_iterations – the maximum number of iterations
delta_threshold – the stopping threshold
- Returns
the value function, the set of maximin strategies for all stage games,
the set of minimax strategies for all stage games, and the stage games themselves