gym_csle_stopping_game.util package

Submodules

gym_csle_stopping_game.util.stopping_game_util module

class gym_csle_stopping_game.util.stopping_game_util.StoppingGameUtil[source]

Bases: object

Class with utility functions for the StoppingGame Environment

static aggregate_belief_mdp_defender(aggregation_resolution: int, T: ndarray[Any, dtype[float64]], R: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]]) Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]][source]

Generates an aggregate belief MDP from a given POMDP specification and aggregation resolution

Parameters
  • aggregation_resolution – the belief aggregation resolution

  • T – the transition tensor of the POMDP

  • R – the reward tensor of the POMDP

  • Z – the observation tensor of the POMDP

  • S – the state space of the POMDP

  • A – the action space of the POMDP

  • O – the observation space of the POMDP

Returns

the state space, action space, transition operator, and belief operator of the belief MDP

static aggregate_belief_transition_probability(b1: ndarray[Any, dtype[float64]], b2: ndarray[Any, dtype[float64]], a: int, S: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], T: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]], aggregate_belief_space: ndarray[Any, dtype[float64]]) float[source]

Calculates the probability of transitioning from belief b1 to belief b2 when taking action a

Parameters
  • b1 – the source belief

  • b2 – the target belief

  • a – the action

  • S – the state space of the POMDP

  • O – the observation space of the POMDP

  • A – the action space of the POMDP

  • T – the transition operator

  • Z – the observation tensor

  • aggregate_belief_space – the aggregate belief space

Returns

the probability P(b2 | b1, a)

static attacker_actions() ndarray[Any, dtype[int64]][source]

Gets the action space of the attacker

Returns

the action space of the attacker

static b1() ndarray[Any, dtype[float64]][source]

Gets the initial belief

Returns

the initial belief

static bayes_filter(s_prime: int, o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], l: int, config: StoppingGameConfig) float[source]

A Bayesian filter to compute the belief of player 1 of being in s_prime when observing o after taking action a in belief b given that the opponent follows strategy pi2

Parameters
  • s_prime – the state to compute the belief of

  • o – the observation

  • a1 – the action of player 1

  • b – the current belief point

  • pi2 – the policy of player 2

  • l – stops remaining

Returns

b_prime(s_prime)

static defender_actions() ndarray[Any, dtype[int64]][source]

Gets the action space of the defender

Returns

the action space of the defender

static find_nearest_neighbor_belief(belief_space: ndarray[Any, dtype[float64]], target_belief: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]][source]

Finds the nearest neighbor (in the Euclidean sense) of a given belief in a certain belief space

Parameters
  • belief_space – the belief to search from

  • target_belief – the belief to find the nearest neighbor of

Returns

the nearest neighbor belief from the belief space

static generate_aggregate_belief_reward_tensor(aggregate_belief_space: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], R: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]][source]

Generates an aggregate reward tensor for the aggregate belief MDP

Parameters
  • aggregate_belief_space – the aggregate belief space

  • S – the state space of the POMDP

  • A – the action space of the POMDP

  • R – the reward tensor of the POMDP

Returns

the reward tensor of the aggregate belief MDP

static generate_aggregate_belief_space(n: int, belief_space_dimension: int) ndarray[Any, dtype[float64]][source]

Generate an aggregate belief space B_n.

Parameters
  • n – the aggregation resolution

  • belief_space_dimension – the belief space dimension

Returns

the aggregate belief space

static generate_aggregate_belief_transition_operator(aggregate_belief_space: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]], T: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]][source]

Generates an aggregate belief space transition operator

Parameters
  • aggregate_belief_space – the aggregate belief space

  • O – the observation space of the POMDP

  • S – the state space of the POMDP

  • A – the action space of the POMDP

  • T – the transition operator of the POMDP

  • Z – the observation tensor of the POMDP

Returns

the aggregate belief space operator

static next_belief(o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], config: StoppingGameConfig, l: int, a2: int = 0, s: int = 0) ndarray[Any, dtype[float64]][source]

Computes the next belief using a Bayesian filter

Parameters
  • o – the latest observation

  • a1 – the latest action of player 1

  • b – the current belief

  • pi2 – the policy of player 2

  • config – the game config

  • l – stops remaining

  • a2 – the attacker action (for debugging, should be consistent with pi2)

  • s – the true state (for debugging)

Returns

the new belief

static observation_space(n)[source]

Returns the observation space of size n

Parameters

n – the maximum observation

Returns

the observation space

static observation_tensor(n)[source]
Returns

a |A1|x|A2|x|S|x|O| tensor

static pomdp_bayes_filter(s_prime: int, o: int, a: int, b: ndarray[Any, dtype[float64]], states: ndarray[Any, dtype[int64]], observations: ndarray[Any, dtype[int64]], observation_tensor: ndarray[Any, dtype[float64]], transition_tensor: ndarray[Any, dtype[float64]]) float[source]

A Bayesian filter to compute b[s_prime] of the POMDP

Parameters
  • s_prime – the state to compute the belief for

  • o – the latest observation

  • a – the latest action

  • b – the current belief

  • states – the list of states

  • observations – the list of observations

  • observation_tensor – the observation tensor

  • transition_tensor – the transition tensor of the POMDP

Returns

b[s_prime]

static pomdp_next_belief(o: int, a: int, b: ndarray[Any, dtype[float64]], states: ndarray[Any, dtype[int64]], observations: ndarray[Any, dtype[int64]], observation_tensor: ndarray[Any, dtype[float64]], transition_tensor: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]][source]

Computes the next belief of the POMDP using a Bayesian filter

Parameters
  • o – the latest observation

  • a – the latest action of player 1

  • b – the current belief

  • states – the list of states

  • observations – the list of observations

  • observation_tensor – the observation tensor

  • transition_tensor – the transition tensor

Returns

the new belief

static pomdp_solver_file(config: StoppingGameConfig, discount_factor: float, pi2: ndarray[Any, dtype[Any]]) str[source]

Gets the POMDP environment specification based on the format at http://www.pomdp.org/code/index.html, for the defender’s local problem against a static attacker

Parameters
  • config – the POMDP config

  • discount_factor – the discount factor

  • pi2 – the attacker strategy

Returns

the file content as a string

static reduce_R_attacker(R: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]][source]

Reduces the reward tensor based on a given attacker strategy

Parameters
  • R – the reward tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced reward tensor (|A1|x|S|)

static reduce_R_defender(R: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]][source]

Reduces the reward tensor based on a given defender strategy

Parameters
  • R – the reward tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced reward tensor (|A2|x|S|)

static reduce_T_attacker(T: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]][source]

Reduces the transition tensor based on a given attacker strategy

Parameters
  • T – the tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced tensor (|A1|x|S|x|S|)

static reduce_T_defender(T: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]][source]

Reduces the transition tensor based on a given defender strategy

Parameters
  • T – the tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced tensor (|A2|x|S|x|S|)

static reduce_Z_attacker(Z: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]][source]

Reduces the observation tensor based on a given attacker strategy

Parameters
  • Z – the observation tensor to reduce

  • strategy – the strategy to use for the reduction

Returns

the reduced observation tensor (|A1|x|S|x|O|)

static reward_tensor(R_SLA: int, R_INT: int, R_COST: int, L: int, R_ST: int) ndarray[Any, dtype[Any]][source]

Gets the reward tensor

Parameters
  • R_SLA – the R_SLA constant

  • R_INT – the R_INT constant

  • R_COST – the R_COST constant

  • R_ST – the R_ST constant

Returns

a |L|x|A1|x|A2|x|S| tensor

static sample_attacker_action(pi2: ndarray[Any, dtype[Any]], s: int) int[source]

Samples the attacker action

Parameters
  • pi2 – the attacker policy

  • s – the game state

Returns

a2 is the attacker action

static sample_initial_state(b1: ndarray[Any, dtype[float64]]) int[source]

Samples the initial state

Parameters

b1 – the initial belief

Returns

s1

static sample_next_observation(Z: ndarray[Any, dtype[Any]], s_prime: int, O: ndarray[Any, dtype[int64]]) int[source]

Samples the next observation

Parameters
  • Z – observation tensor which include the observation probables

  • s_prime – the new state

  • O – the observation space

Returns

o

static sample_next_state(T: ndarray[Any, dtype[Any]], l: int, s: int, a1: int, a2: int, S: ndarray[Any, dtype[int64]]) int[source]

Samples the next state

Parameters
  • T – the transition operator

  • s – the currrent state

  • a1 – the defender action

  • a2 – the attacker action

  • S – the state space

  • l – the number of stops remaining

Returns

s’

static state_space()[source]

Gets the state space

Returns

the state space of the game

static transition_tensor(L: int) ndarray[Any, dtype[Any]][source]

Gets the transition tensor

Parameters

L – the maximum number of stop actions

Returns

a |L|x|A1|x|A2||S|^2 tensor

Module contents