gym_csle_stopping_game.util package
Submodules
gym_csle_stopping_game.util.stopping_game_util module
- class gym_csle_stopping_game.util.stopping_game_util.StoppingGameUtil[source]
Bases:
object
Class with utility functions for the StoppingGame Environment
- static aggregate_belief_mdp_defender(aggregation_resolution: int, T: ndarray[Any, dtype[float64]], R: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]]) Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]] [source]
Generates an aggregate belief MDP from a given POMDP specification and aggregation resolution
- Parameters
aggregation_resolution – the belief aggregation resolution
T – the transition tensor of the POMDP
R – the reward tensor of the POMDP
Z – the observation tensor of the POMDP
S – the state space of the POMDP
A – the action space of the POMDP
O – the observation space of the POMDP
- Returns
the state space, action space, transition operator, and belief operator of the belief MDP
- static aggregate_belief_transition_probability(b1: ndarray[Any, dtype[float64]], b2: ndarray[Any, dtype[float64]], a: int, S: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], T: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]], aggregate_belief_space: ndarray[Any, dtype[float64]]) float [source]
Calculates the probability of transitioning from belief b1 to belief b2 when taking action a
- Parameters
b1 – the source belief
b2 – the target belief
a – the action
S – the state space of the POMDP
O – the observation space of the POMDP
A – the action space of the POMDP
T – the transition operator
Z – the observation tensor
aggregate_belief_space – the aggregate belief space
- Returns
the probability P(b2 | b1, a)
- static attacker_actions() ndarray[Any, dtype[int64]] [source]
Gets the action space of the attacker
- Returns
the action space of the attacker
- static b1() ndarray[Any, dtype[float64]] [source]
Gets the initial belief
- Returns
the initial belief
- static bayes_filter(s_prime: int, o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], l: int, config: StoppingGameConfig) float [source]
A Bayesian filter to compute the belief of player 1 of being in s_prime when observing o after taking action a in belief b given that the opponent follows strategy pi2
- Parameters
s_prime – the state to compute the belief of
o – the observation
a1 – the action of player 1
b – the current belief point
pi2 – the policy of player 2
l – stops remaining
- Returns
b_prime(s_prime)
- static defender_actions() ndarray[Any, dtype[int64]] [source]
Gets the action space of the defender
- Returns
the action space of the defender
- static find_nearest_neighbor_belief(belief_space: ndarray[Any, dtype[float64]], target_belief: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]] [source]
Finds the nearest neighbor (in the Euclidean sense) of a given belief in a certain belief space
- Parameters
belief_space – the belief to search from
target_belief – the belief to find the nearest neighbor of
- Returns
the nearest neighbor belief from the belief space
- static generate_aggregate_belief_reward_tensor(aggregate_belief_space: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], R: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]] [source]
Generates an aggregate reward tensor for the aggregate belief MDP
- Parameters
aggregate_belief_space – the aggregate belief space
S – the state space of the POMDP
A – the action space of the POMDP
R – the reward tensor of the POMDP
- Returns
the reward tensor of the aggregate belief MDP
- static generate_aggregate_belief_space(n: int, belief_space_dimension: int) ndarray[Any, dtype[float64]] [source]
Generate an aggregate belief space B_n.
- Parameters
n – the aggregation resolution
belief_space_dimension – the belief space dimension
- Returns
the aggregate belief space
- static generate_aggregate_belief_transition_operator(aggregate_belief_space: ndarray[Any, dtype[float64]], S: ndarray[Any, dtype[int64]], A: ndarray[Any, dtype[int64]], O: ndarray[Any, dtype[int64]], T: ndarray[Any, dtype[float64]], Z: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]] [source]
Generates an aggregate belief space transition operator
- Parameters
aggregate_belief_space – the aggregate belief space
O – the observation space of the POMDP
S – the state space of the POMDP
A – the action space of the POMDP
T – the transition operator of the POMDP
Z – the observation tensor of the POMDP
- Returns
the aggregate belief space operator
- static next_belief(o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], config: StoppingGameConfig, l: int, a2: int = 0, s: int = 0) ndarray[Any, dtype[float64]] [source]
Computes the next belief using a Bayesian filter
- Parameters
o – the latest observation
a1 – the latest action of player 1
b – the current belief
pi2 – the policy of player 2
config – the game config
l – stops remaining
a2 – the attacker action (for debugging, should be consistent with pi2)
s – the true state (for debugging)
- Returns
the new belief
- static observation_space(n)[source]
Returns the observation space of size n
- Parameters
n – the maximum observation
- Returns
the observation space
- static observation_tensor(n)[source]
- Returns
a |A1|x|A2|x|S|x|O| tensor
- static pomdp_bayes_filter(s_prime: int, o: int, a: int, b: ndarray[Any, dtype[float64]], states: ndarray[Any, dtype[int64]], observations: ndarray[Any, dtype[int64]], observation_tensor: ndarray[Any, dtype[float64]], transition_tensor: ndarray[Any, dtype[float64]]) float [source]
A Bayesian filter to compute b[s_prime] of the POMDP
- Parameters
s_prime – the state to compute the belief for
o – the latest observation
a – the latest action
b – the current belief
states – the list of states
observations – the list of observations
observation_tensor – the observation tensor
transition_tensor – the transition tensor of the POMDP
- Returns
b[s_prime]
- static pomdp_next_belief(o: int, a: int, b: ndarray[Any, dtype[float64]], states: ndarray[Any, dtype[int64]], observations: ndarray[Any, dtype[int64]], observation_tensor: ndarray[Any, dtype[float64]], transition_tensor: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]] [source]
Computes the next belief of the POMDP using a Bayesian filter
- Parameters
o – the latest observation
a – the latest action of player 1
b – the current belief
states – the list of states
observations – the list of observations
observation_tensor – the observation tensor
transition_tensor – the transition tensor
- Returns
the new belief
- static pomdp_solver_file(config: StoppingGameConfig, discount_factor: float, pi2: ndarray[Any, dtype[Any]]) str [source]
Gets the POMDP environment specification based on the format at http://www.pomdp.org/code/index.html, for the defender’s local problem against a static attacker
- Parameters
config – the POMDP config
discount_factor – the discount factor
pi2 – the attacker strategy
- Returns
the file content as a string
- static reduce_R_attacker(R: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]] [source]
Reduces the reward tensor based on a given attacker strategy
- Parameters
R – the reward tensor to reduce
strategy – the strategy to use for the reduction
- Returns
the reduced reward tensor (|A1|x|S|)
- static reduce_R_defender(R: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]] [source]
Reduces the reward tensor based on a given defender strategy
- Parameters
R – the reward tensor to reduce
strategy – the strategy to use for the reduction
- Returns
the reduced reward tensor (|A2|x|S|)
- static reduce_T_attacker(T: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]] [source]
Reduces the transition tensor based on a given attacker strategy
- Parameters
T – the tensor to reduce
strategy – the strategy to use for the reduction
- Returns
the reduced tensor (|A1|x|S|x|S|)
- static reduce_T_defender(T: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]] [source]
Reduces the transition tensor based on a given defender strategy
- Parameters
T – the tensor to reduce
strategy – the strategy to use for the reduction
- Returns
the reduced tensor (|A2|x|S|x|S|)
- static reduce_Z_attacker(Z: ndarray[Any, dtype[float64]], strategy: Policy) ndarray[Any, dtype[float64]] [source]
Reduces the observation tensor based on a given attacker strategy
- Parameters
Z – the observation tensor to reduce
strategy – the strategy to use for the reduction
- Returns
the reduced observation tensor (|A1|x|S|x|O|)
- static reward_tensor(R_SLA: int, R_INT: int, R_COST: int, L: int, R_ST: int) ndarray[Any, dtype[Any]] [source]
Gets the reward tensor
- Parameters
R_SLA – the R_SLA constant
R_INT – the R_INT constant
R_COST – the R_COST constant
R_ST – the R_ST constant
- Returns
a |L|x|A1|x|A2|x|S| tensor
- static sample_attacker_action(pi2: ndarray[Any, dtype[Any]], s: int) int [source]
Samples the attacker action
- Parameters
pi2 – the attacker policy
s – the game state
- Returns
a2 is the attacker action
- static sample_initial_state(b1: ndarray[Any, dtype[float64]]) int [source]
Samples the initial state
- Parameters
b1 – the initial belief
- Returns
s1
- static sample_next_observation(Z: ndarray[Any, dtype[Any]], s_prime: int, O: ndarray[Any, dtype[int64]]) int [source]
Samples the next observation
- Parameters
Z – observation tensor which include the observation probables
s_prime – the new state
O – the observation space
- Returns
o
- static sample_next_state(T: ndarray[Any, dtype[Any]], l: int, s: int, a1: int, a2: int, S: ndarray[Any, dtype[int64]]) int [source]
Samples the next state
- Parameters
T – the transition operator
s – the currrent state
a1 – the defender action
a2 – the attacker action
S – the state space
l – the number of stops remaining
- Returns
s’