csle_agents.agents.shapley_iteration package

Submodules

csle_agents.agents.shapley_iteration.shapley_iteration_agent module

class csle_agents.agents.shapley_iteration.shapley_iteration_agent.ShapleyIterationAgent(simulation_env_config: SimulationEnvConfig, experiment_config: ExperimentConfig, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True)[source]

Bases: BaseAgent

Shapley Iteration Agent

auxillary_game(V: ndarray[Any, dtype[Any]], gamma: float, S: ndarray[Any, dtype[Any]], s: int, A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], R: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]]) → ndarray[Any, dtype[Any]][source]

Creates an auxillary matrix game based on the value function V

Parameters

V – the value function
gamma – the discount factor
S – the set of states
s – the state s
A1 – the set of actions of player 1
A2 – the set of actions of player 2
R – the reward tensor
T – the transition tensor

Returns

the matrix auxillary game

compute_matrix_game_value(A: ndarray[Any, dtype[Any]], A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], maximizer: bool = True)[source]

Parameters

A – the matrix game
A1 – the set of actions of player 1
A2 – the set of acitons of player 2
maximizer – a boolean flag indicating whether the maximin or minimax strategy should be computed

Returns

(val(A), maximin/minimax)

hparam_names() → List[str][source]

Returns: a list with the hyperparameter names

shapley_iteration(exp_result: ExperimentResult, seed: int) → ExperimentResult[source]

Runs the Shapley iteration algorithm

Parameters

exp_result – the experiment result object
seed – the random seed

Returns

the updated experiment result

si(S: ndarray[Any, dtype[Any]], A1: ndarray[Any, dtype[Any]], A2: ndarray[Any, dtype[Any]], R: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]], gamma: float = 1, max_iterations: int = 500, delta_threshold: float = 0.1) → Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], List[float]][source]

Shapley Iteration (L. Shapley 1953)

Parameters

S – the set of states of the SG
A1 – the set of actions of player 1 in the SG
A2 – the set of actions of player 2 in the SG
R – the reward tensor in the SG
T – the transition tensor in the SG
gamma – the discount factor
max_iterations – the maximum number of iterations
delta_threshold – the stopping threshold

Returns

the value function, the set of maximin strategies for all stage games,

the set of minimax strategies for all stage games, and the stage games themselves

train() → ExperimentExecution[source]

Runs the value iteration algorithm to compute V*

Returns: the results

csle_agents.agents.shapley_iteration package

Submodules

csle_agents.agents.shapley_iteration.shapley_iteration_agent module

Module contents