gym_csle_stopping_game.util package

Submodules

gym_csle_stopping_game.util.stopping_game_util module

class gym_csle_stopping_game.util.stopping_game_util.StoppingGameUtil[source]

Bases: object

Class with utility functions for the StoppingGame Environment

static aggregate_belief_mdp_defender(aggregation_resolution: int, T: ndarray[Any, dtype[float64]], R: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]]) → Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]][source]

Generates an aggregate belief MDP from a given POMDP specification and aggregation resolution

Parameters

aggregation_resolution – the belief aggregation resolution
T – the transition tensor of the POMDP
R – the reward tensor of the POMDP
Z – the observation tensor of the POMDP
S – the state space of the POMDP
A – the action space of the POMDP
O – the observation space of the POMDP

Returns

the state space, action space, transition operator, and belief operator of the belief MDP

static aggregate_belief_transition_probability(b1: ndarray[Any, dtype[float64]], b2: ndarray[Any, dtype[float64]], a: int, S: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], T: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]], aggregate_belief_space: ndarray[Any, dtype[float64]]) → float[source]

Calculates the probability of transitioning from belief b1 to belief b2 when taking action a

Parameters

b1 – the source belief
b2 – the target belief
a – the action
S – the state space of the POMDP
O – the observation space of the POMDP
A – the action space of the POMDP
T – the transition operator
Z – the observation tensor
aggregate_belief_space – the aggregate belief space

Returns

the probability P(b2 | b1, a)

static attacker_actions() → ndarray[Any, dtype[int64]][source]

Gets the action space of the attacker

Returns: the action space of the attacker

static b1() → ndarray[Any, dtype[float64]][source]

Gets the initial belief

Returns: the initial belief

static bayes_filter(s_prime: int, o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], l: int, config: StoppingGameConfig) → float[source]

A Bayesian filter to compute the belief of player 1 of being in s_prime when observing o after taking action a in belief b given that the opponent follows strategy pi2

Parameters

s_prime – the state to compute the belief of
o – the observation
a1 – the action of player 1
b – the current belief point
pi2 – the policy of player 2
l – stops remaining

Returns

b_prime(s_prime)

static defender_actions() → ndarray[Any, dtype[int64]][source]

Gets the action space of the defender

Returns: the action space of the defender

static find_nearest_neighbor_belief(belief_space: ndarray[Any, dtype[float64]], target_belief: ndarray[Any, dtype[float64]]) → ndarray[Any, dtype[float64]][source]

Finds the nearest neighbor (in the Euclidean sense) of a given belief in a certain belief space

Parameters

belief_space – the belief to search from
target_belief – the belief to find the nearest neighbor of

Returns

the nearest neighbor belief from the belief space

static generate_aggregate_belief_reward_tensor(aggregate_belief_space: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], R: ndarray[Any, dtype[float64]]) → ndarray[Any, dtype[float64]][source]

Generates an aggregate reward tensor for the aggregate belief MDP

Parameters

aggregate_belief_space – the aggregate belief space
S – the state space of the POMDP
A – the action space of the POMDP
R – the reward tensor of the POMDP

Returns

the reward tensor of the aggregate belief MDP

static generate_aggregate_belief_space(n: int, belief_space_dimension: int) → ndarray[Any, dtype[float64]][source]

Generate an aggregate belief space B_n.

Parameters

n – the aggregation resolution
belief_space_dimension – the belief space dimension

Returns

the aggregate belief space

static generate_aggregate_belief_transition_operator(aggregate_belief_space: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]], T: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]]) → ndarray[Any, dtype[float64]][source]

Generates an aggregate belief space transition operator

Parameters

aggregate_belief_space – the aggregate belief space
O – the observation space of the POMDP
S – the state space of the POMDP
A – the action space of the POMDP
T – the transition operator of the POMDP
Z – the observation tensor of the POMDP

Returns

the aggregate belief space operator

static next_belief(o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], config: StoppingGameConfig, l: int, a2: int = 0, s: int = 0) → ndarray[Any, dtype[float64]][source]

Computes the next belief using a Bayesian filter

Parameters

o – the latest observation
a1 – the latest action of player 1
b – the current belief
pi2 – the policy of player 2
config – the game config
l – stops remaining
a2 – the attacker action (for debugging, should be consistent with pi2)
s – the true state (for debugging)

Returns

the new belief

static observation_space(n)[source]

Returns the observation space of size n

Parameters: n – the maximum observation
Returns: the observation space

static observation_tensor(n)[source]

Returns: a |A1|x|A2|x|S|x|O| tensor

static pomdp_bayes_filter(s_prime: int, o: int, a: int, b: ndarray[Any, dtype[float64]], states: ndarray[Any, dtype[int64]], observations: ndarray[Any, dtype[int64]], observation_tensor: ndarray[Any, dtype[float64]], transition_tensor: ndarray[Any, dtype[float64]]) → float[source]

A Bayesian filter to compute b[s_prime] of the POMDP

Parameters

s_prime – the state to compute the belief for
o – the latest observation
a – the latest action
b – the current belief
states – the list of states
observations – the list of observations
observation_tensor – the observation tensor
transition_tensor – the transition tensor of the POMDP

Returns

b[s_prime]

static pomdp_next_belief(o: int, a: int, b: ndarray[Any, dtype[float64]], states: ndarray[Any, dtype[int64]], observations: ndarray[Any, dtype[int64]], observation_tensor: ndarray[Any, dtype[float64]], transition_tensor: ndarray[Any, dtype[float64]]) → ndarray[Any, dtype[float64]][source]

Computes the next belief of the POMDP using a Bayesian filter

Parameters

o – the latest observation
a – the latest action of player 1
b – the current belief
states – the list of states
observations – the list of observations
observation_tensor – the observation tensor
transition_tensor – the transition tensor

Returns

the new belief

static pomdp_solver_file(config: StoppingGameConfig, discount_factor: float, pi2: ndarray[Any, dtype[Any]]) → str[source]

Gets the POMDP environment specification based on the format at http://www.pomdp.org/code/index.html, for the defender’s local problem against a static attacker

Parameters

config – the POMDP config
discount_factor – the discount factor
pi2 – the attacker strategy

Returns

the file content as a string

static reduce_R_attacker(R: ndarray[Any, dtype[float64]], strategy: Policy) → ndarray[Any, dtype[float64]][source]

Reduces the reward tensor based on a given attacker strategy

Parameters

R – the reward tensor to reduce
strategy – the strategy to use for the reduction

Returns

the reduced reward tensor (|A1|x|S|)

static reduce_R_defender(R: ndarray[Any, dtype[float64]], strategy: Policy) → ndarray[Any, dtype[float64]][source]

Reduces the reward tensor based on a given defender strategy

Parameters

R – the reward tensor to reduce
strategy – the strategy to use for the reduction

Returns

the reduced reward tensor (|A2|x|S|)

static reduce_T_attacker(T: ndarray[Any, dtype[float64]], strategy: Policy) → ndarray[Any, dtype[float64]][source]

Reduces the transition tensor based on a given attacker strategy

Parameters

T – the tensor to reduce
strategy – the strategy to use for the reduction

Returns

the reduced tensor (|A1|x|S|x|S|)

static reduce_T_defender(T: ndarray[Any, dtype[float64]], strategy: Policy) → ndarray[Any, dtype[float64]][source]

Reduces the transition tensor based on a given defender strategy

Parameters

T – the tensor to reduce
strategy – the strategy to use for the reduction

Returns

the reduced tensor (|A2|x|S|x|S|)

static reduce_Z_attacker(Z: ndarray[Any, dtype[float64]], strategy: Policy) → ndarray[Any, dtype[float64]][source]

Reduces the observation tensor based on a given attacker strategy

Parameters

Z – the observation tensor to reduce
strategy – the strategy to use for the reduction

Returns

the reduced observation tensor (|A1|x|S|x|O|)

static reward_tensor(R_SLA: int, R_INT: int, R_COST: int, L: int, R_ST: int) → ndarray[Any, dtype[Any]][source]

Gets the reward tensor

Parameters

R_SLA – the R_SLA constant
R_INT – the R_INT constant
R_COST – the R_COST constant
R_ST – the R_ST constant

Returns

a |L|x|A1|x|A2|x|S| tensor

static sample_attacker_action(pi2: ndarray[Any, dtype[Any]], s: int) → int[source]

Samples the attacker action

Parameters

pi2 – the attacker policy
s – the game state

Returns

a2 is the attacker action

static sample_initial_state(b1: ndarray[Any, dtype[float64]]) → int[source]

Samples the initial state

Parameters: b1 – the initial belief
Returns: s1

static sample_next_observation(Z: ndarray[Any, dtype[Any]], s_prime: int, O: ndarray[Any, dtype[int64]]) → int[source]

Samples the next observation

Parameters

Z – observation tensor which include the observation probables
s_prime – the new state
O – the observation space

Returns

o

static sample_next_state(T: ndarray[Any, dtype[Any]], l: int, s: int, a1: int, a2: int, S: ndarray[Any, dtype[int64]]) → int[source]

Samples the next state

Parameters

T – the transition operator
s – the currrent state
a1 – the defender action
a2 – the attacker action
S – the state space
l – the number of stops remaining

Returns

s’

static state_space()[source]

Gets the state space

Returns: the state space of the game

static transition_tensor(L: int) → ndarray[Any, dtype[Any]][source]

Gets the transition tensor

Parameters: L – the maximum number of stop actions
Returns: a |L|x|A1|x|A2||S|^2 tensor

gym_csle_stopping_game.util package

Submodules

gym_csle_stopping_game.util.stopping_game_util module

Module contents