gym_csle_apt_game.util package

Submodules

gym_csle_apt_game.util.apt_game_util module

class gym_csle_apt_game.util.apt_game_util.AptGameUtil[source]

Bases: object

Class with utility functions for the APTGame Environment

static attacker_actions() → ndarray[Any, dtype[int64]][source]

Gets the action space of the attacker

Returns: the action space of the attacker

static b1(N: int) → ndarray[Any, dtype[float64]][source]

Gets the initial belief

Parameters: N – the number of servers
Returns: the initial belief

static bayes_filter(s_prime: int, o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], config: AptGameConfig) → float[source]

A Bayesian filter to compute the belief of player 1 of being in s_prime when observing o after taking action a in belief b given that the opponent follows strategy pi2

Parameters

s_prime – the state to compute the belief of
o – the observation
a1 – the action of player 1
b – the current belief point
pi2 – the policy of player 2

Returns

b_prime(s_prime)

static cost_function(s: int, a_1: int) → float[source]

The cost function of the game

Parameters

s – the state
a_1 – the defender action

Returns

the immediate cost

static cost_tensor(N: int) → ndarray[Any, dtype[Any]][source]

Gets the reward tensor

Returns: a |A1|x|S| tensor

static defender_actions() → ndarray[Any, dtype[int64]][source]

Gets the action space of the defender

Returns: the action space of the defender

static expected_cost(C: List[List[float]], b: List[float], S: List[int], a1: int) → float[source]

Gets the expected cost of defender action a1 in belief state b

Parameters

C – the cost tensor
b – the belief state
S – the state space
a1 – the defender action

Returns

the expected cost

static generate_os_posg_game_file(game_config: AptGameConfig) → str[source]

Generates the POSG game file for HSVI

Parameters: game_config – the game configuration
Returns: a string with the contents of the config file

static generate_rewards(game_config: AptGameConfig) → List[str][source]

Generates the reward rows of the POSG config file of HSVI

Parameters: game_config – the game configuration
Returns: list of reward rows

static generate_transitions(game_config: AptGameConfig) → List[str][source]

Generates the transition rows of the POSG config file of HSVI

Parameters: game_config – the game configuration
Returns: list of transition rows

static next_belief(o: int, a1: int, b: ndarray[Any, dtype[float64]], pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, a2: int = 0, s: int = 0) → ndarray[Any, dtype[float64]][source]

Computes the next belief using a Bayesian filter

Parameters

o – the latest observation
a1 – the latest action of player 1
b – the current belief
pi2 – the policy of player 2
config – the game config
a2 – the attacker action (for debugging, should be consistent with pi2)
s – the true state (for debugging)

Returns

the new belief

static observation_space(num_observations: int)[source]

Returns the observation space of size n

Parameters: num_observations – the number of observations
Returns: the observation space

static observation_tensor(num_observations, N: int) → ndarray[Any, dtype[Any]][source]

Gets the observation tensor of the game

Parameters

num_observations – the number of observations
N – the number of servers

Returns

a |S|x|O| observation tensor

static sample_attacker_action(pi2: ndarray[Any, dtype[Any]], s: int) → int[source]

Samples the attacker action

Parameters

pi2 – the attacker action
s – the game state

Returns

a2 (the attacker action)

static sample_defender_action(alpha: float, b: List[float]) → int[source]

Samples the attacker action

Parameters

alpha – the defender threshold
b – the belief state

Returns

a1 (the defender action)

static sample_initial_state(b1: ndarray[Any, dtype[float64]]) → int[source]

Samples the initial state

Parameters: b1 – the initial belief
Returns: s1

static sample_next_observation(Z: ndarray[Any, dtype[Any]], s_prime: int, O: ndarray[Any, dtype[int64]]) → int[source]

Samples the next observation

Parameters

s_prime – the new state
O – the observation space

Returns

o

static sample_next_state(T: ndarray[Any, dtype[Any]], s: int, a1: int, a2: int, S: ndarray[Any, dtype[int64]]) → int[source]

Samples the next state

Parameters

T – the transition operator
s – the current state
a1 – the defender action
a2 – the attacker action
S – the state space

Returns

s’

static state_space(N: int)[source]

Gets the state space

Parameters: N – the number of servers
Returns: the state space of the game

static transition_function(N: int, p_a: float, s: int, s_prime: int, a_1: int, a_2: int) → float[source]

The transition function of the game

Parameters

N – the number of servers
p_a – the intrusion probability
s – the state
s_prime – the next state
a_1 – the defender action
a_2 – the attacker action

Returns

f(s_prime | s, a_1, a_2)

static transition_tensor(N: int, p_a: float) → ndarray[Any, dtype[Any]][source]

Gets the transition tensor

Parameters: L – the maximum number of stop actions
Returns: a |A1|x|A2||S|^2 tensor

gym_csle_apt_game.util.rollout_util module

class gym_csle_apt_game.util.rollout_util.RolloutUtil[source]

Bases: object

Class with utility functions for rollout

static eval_attacker_base(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, horizon: int, s: Optional[int], b: ndarray[Any, dtype[Any]], id: int) → float[source]

Function for evaluating a base threshold strategy of the attacker

Parameters

alpha – the defender’s threshold
pi2 – the attacker’s base strategy
config – the game configuration
horizon – the horizon for the Monte-Carlo sampling
id – the id of the parallel processor
s – the state
b – the belief

Returns

the average return

static eval_attacker_base_parallel(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, num_samples: int, horizon: int, s: Optional[int], b: List[float]) → float[source]

Starts a pool of parallel processors for evaluating a threshold base strategy of the attacker

Parameters

alpha – the threshold of the defender
pi2 – the base strategy of the attacker
config – the game configuration
num_samples – the number of monte carlo samples
horizon – the horizon of the Monte-Carlo sampling
s – the state
b – the belief

Returns

the average cost-to-go of the base strategy

static eval_defender_base(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, horizon: int, s: Optional[int], b: ndarray[Any, dtype[Any]], id: int) → float[source]

Function for evaluating a base threshold strategy of the defender

Parameters

alpha – the defender’s threshold
pi2 – the attacker’s strategy
config – the game configuration
horizon – the horizon for the Monte-Carlo sampling
id – the id of the parallel processor
s – the state
b – the belief

Returns

the average return

static eval_defender_base_parallel(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, num_samples: int, horizon: int, s: Union[None, int], b: List[float]) → float[source]

Starts a pool of parallel processors for evaluating a threshold base strategy of the defender

Parameters

alpha – the threshold
pi2 – the defender strategy
config – the game configuration
num_samples – the number of monte carlo samples
horizon – the horizon of the Monte-Carlo sampling
s – the state
b – the belief

Returns

the average cost-to-go of the base strategy

static exact_defender_rollout(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, num_samples: int, horizon: int, ell: int, b: List[float]) → Tuple[int, float][source]

Performs exact rollout of the defender against a fixed attacker strategy and with a threshold base strategy

Parameters

alpha – the threshold base strategy
pi2 – the strategy of the attacker
config – the game configuraton
num_samples – the number of Monte-Carlo samples
horizon – the horizon for the Monte-Carlo sampling
ell – the lookahead length
b – the belief state

Returns

The rollout action and the corresponding Q-factor

static monte_carlo_attacker_rollout(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, num_samples: int, horizon: int, ell: int, b: List[float], a1: Union[None, int] = None, s: Union[None, int] = None) → Tuple[int, float][source]

Monte-Carlo based on rollout of the attacker with a threshold base strategy

Parameters

alpha – the threshold of the defender
pi – the base strategy of the attacker
config – the game configuration
num_samples – the number of monte-carlo samples
horizon – the horizon for monte-carlo sampling
ell – the lookahead length
b – the belief state
a1 – the action of the defender

Returns

The rollout action and the corresponding Q-factor

static monte_carlo_defender_rollout(alpha: float, pi2: ndarray[Any, dtype[Any]], config: AptGameConfig, num_samples: int, horizon: int, ell: int, b: List[float], a2: Union[None, int] = None, s: Union[None, int] = None) → Tuple[int, float][source]

Monte-Carlo based on rollout of the defender with a threshold base strategy

Parameters

alpha – the threshold of the base strategy
pi2 – the attacker strategy
config – the game configuration
num_samples – the number of monte-carlo samples
horizon – the horizon for monte-carlo sampling
ell – the lookahead length
b – the belief state
a2 – the action of the attacker
s – the state

Returns

The rollout action and the corresponding Q-factor

gym_csle_apt_game.util package

Submodules

gym_csle_apt_game.util.apt_game_util module

gym_csle_apt_game.util.rollout_util module

Module contents