csle_agents.agents.shapley_iteration package

Submodules

csle_agents.agents.shapley_iteration.shapley_iteration_agent module

class csle_agents.agents.shapley_iteration.shapley_iteration_agent.ShapleyIterationAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

Shapley Iteration Agent

auxillary_game(V: numpy.ndarray[Any, numpy.dtype[Any]], gamma: float, S: numpy.ndarray[Any, numpy.dtype[Any]], s: int, A1: numpy.ndarray[Any, numpy.dtype[Any]], A2: numpy.ndarray[Any, numpy.dtype[Any]], R: numpy.ndarray[Any, numpy.dtype[Any]], T: numpy.ndarray[Any, numpy.dtype[Any]]) numpy.ndarray[Any, numpy.dtype[Any]][source]

Creates an auxillary matrix game based on the value function V

Parameters
  • V – the value function

  • gamma – the discount factor

  • S – the set of states

  • s – the state s

  • A1 – the set of actions of player 1

  • A2 – the set of actions of player 2

  • R – the reward tensor

  • T – the transition tensor

Returns

the matrix auxillary game

compute_matrix_game_value(A: numpy.ndarray[Any, numpy.dtype[Any]], A1: numpy.ndarray[Any, numpy.dtype[Any]], A2: numpy.ndarray[Any, numpy.dtype[Any]], maximizer: bool = True)[source]
Parameters
  • A – the matrix game

  • A1 – the set of actions of player 1

  • A2 – the set of acitons of player 2

  • maximizer – a boolean flag indicating whether the maximin or minimax strategy should be computed

Returns

(val(A), maximin/minimax)

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

shapley_iteration(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int) csle_common.dao.training.experiment_result.ExperimentResult[source]

Runs the Shapley iteration algorithm

Parameters
  • exp_result – the experiment result object

  • seed – the random seed

Returns

the updated experiment result

si(S: numpy.ndarray[Any, numpy.dtype[Any]], A1: numpy.ndarray[Any, numpy.dtype[Any]], A2: numpy.ndarray[Any, numpy.dtype[Any]], R: numpy.ndarray[Any, numpy.dtype[Any]], T: numpy.ndarray[Any, numpy.dtype[Any]], gamma: float = 1, max_iterations: int = 500, delta_threshold: float = 0.1) Tuple[numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], List[float]][source]

Shapley Iteration (L. Shapley 1953)

Parameters
  • S – the set of states of the SG

  • A1 – the set of actions of player 1 in the SG

  • A2 – the set of actions of player 2 in the SG

  • R – the reward tensor in the SG

  • T – the transition tensor in the SG

  • gamma – the discount factor

  • max_iterations – the maximum number of iterations

  • delta_threshold – the stopping threshold

Returns

the value function, the set of maximin strategies for all stage games,

the set of minimax strategies for all stage games, and the stage games themselves

train() csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Runs the value iteration algorithm to compute V*

Returns

the results

Module contents