csle_agents.agents.pomcp package

Submodules

csle_agents.agents.pomcp.action_node module

class csle_agents.agents.pomcp.action_node.ActionNode(id: int, history: List[int], action: int, parent=None, value: float = - 2000, visit_count: int = 0)[source]

Bases: csle_agents.agents.pomcp.node.Node

A node in the POMCP history tree where the last element of the history was an action

add_child(node: csle_agents.agents.pomcp.node.Node) → None[source]

Adds a child node to the tree. Since an action is always followed by an observation in the history, the next node will be an observation/belief node

Parameters: node – the new child node to add
Returns: None

get_child(key: int) → Union[None, csle_agents.agents.pomcp.node.Node][source]

Gets the child node corresponding to a specific observation

Parameters: key – the observation to get the node for
Returns: the child node or None if it was not found

update_stats(immediate_reward: float) → None[source]

Updates the mean return from the node by computing the rolling average

Parameters: immediate_reward – the latest reward sample
Returns: None

csle_agents.agents.pomcp.belief_node module

class csle_agents.agents.pomcp.belief_node.BeliefNode(id: int, history: List[int], observation: int, parent=None, value: float = - 2000, visit_count: int = 0)[source]

Bases: csle_agents.agents.pomcp.node.Node

Represents a node that holds the belief distribution given its history sequence in a belief tree. It also holds the received observation after which the belief is updated accordingly

add_child(node: csle_agents.agents.pomcp.node.Node) → None[source]

Adds a child node to this node. Since an observation is always followed by an action in the history, the next node will be an action node

Parameters: node – the child action node
Returns: None

add_particle(particle: Union[int, List[int]]) → None[source]

Adds a paticle (a sample state) to the list of particles

Parameters: particle – the particle to add
Returns: None

get_child(key: int) → Optional[csle_agents.agents.pomcp.node.Node][source]

Gets the child node corresponding to a specific action

Parameters: key – the action to get the node for
Returns: the node or None if it was not found

sample_state() → Any[source]

Samples a state from the belief state

Returns: the sampled state

csle_agents.agents.pomcp.belief_tree module

class csle_agents.agents.pomcp.belief_tree.BeliefTree(root_particles: List[int], default_node_value: float, root_observation: int, initial_visit_count: int = 0)[source]

Bases: object

The belief tree of POMCP. Each node in the tree corresponds to a history of the POMDP, where a history is a sequence of actions and observations.

add(history: List[int], parent: Optional[Union[csle_agents.agents.pomcp.node.Node, csle_agents.agents.pomcp.action_node.ActionNode, csle_agents.agents.pomcp.belief_node.BeliefNode]], action: Optional[int] = None, observation: Optional[int] = None, particle: Optional[Any] = None, value: float = 0, initial_visit_count: int = 0) → csle_agents.agents.pomcp.node.Node[source]

Creates and adds a new belief node or action node to the belief search tree

Parameters

h – history sequence
parent – either ActionNode or BeliefNode
action – action
observation – observation
particle – new node’s particle set
cost – action cost of an action node
value – the value of the node
initial_visit_count – the initial visit count

Returns

The newly added node

find_or_create(history: List[int], parent: Union[None, csle_agents.agents.pomcp.belief_node.BeliefNode, csle_agents.agents.pomcp.action_node.ActionNode], observation: int, initial_value: float, initial_visit_count: int) → csle_agents.agents.pomcp.node.Node[source]

Search for the node that corresponds to given history, otherwise create one using given params

Parameters

history – the current history
parent – the parent of the node
observation – the latest observation
initial_value – the initial value of a created node
initial_visit_count – the initial visit count of a created node

Returns

the new node

prune(node, exclude=None)[source]: Removes the entire subtree subscribed to ‘node’ with exceptions. :param node: root of the subtree to be removed :param exclude: exception component :return:

csle_agents.agents.pomcp.node module

class csle_agents.agents.pomcp.node.Node(id: int, history: List[int], parent=None, value: float = - 2000, visit_count: int = 0, observation: int = - 1, action: int = - 1)[source]

Bases: object

Abstract node type, represents a node in the lookahead tree

abstract add_child(node: csle_agents.agents.pomcp.node.Node) → None[source]

Method that adds a child to the node. Should be implemented by classes that inherit from this class

Parameters: node – the node to add
Returns: None

abstract get_child(key: int) → Optional[csle_agents.agents.pomcp.node.Node][source]

Method that gets the child to the node. Should be implemented by classes that inherit from this class

Parameters: key – the key to identify the child
Returns: the child

csle_agents.agents.pomcp.pomcp module

class csle_agents.agents.pomcp.pomcp.POMCP(A: List[int], gamma: float, env: csle_common.dao.simulation_config.base_env.BaseEnv, c: float, initial_particles: List[Any], planning_time: float = 0.5, max_particles: int = 350, reinvigoration: bool = False, reinvigorated_particles_ratio: float = 0.1, rollout_policy: Optional[csle_common.dao.training.policy.Policy] = None, value_function: Optional[Callable[[Any], float]] = None, verbose: bool = False, default_node_value: float = 0, prior_weight: float = 1.0, prior_confidence: int = 0, acquisition_function_type: csle_agents.agents.pomcp.pomcp_acquisition_function_type.POMCPAcquisitionFunctionType = POMCPAcquisitionFunctionType.UCB, c2: float = 1, use_rollout_policy: bool = False, prune_action_space: bool = False, prune_size: int = 3)[source]

Bases: object

Class that implements the POMCP algorithm

compute_belief() → Dict[int, float][source]

Computes the belief state based on the particles

Returns: the belief state

get_action() → int[source]

Gets the next action to execute based on the state of the tree. Selects the action with the highest value from the root node.

Returns: the next action

rollout(state: int, history: List[int], depth: int, max_rollout_depth: int, t: int) → float[source]

Perform randomized recursive rollout search starting from the given history until the max depth has been achieved

Parameters

state – the initial state of the rollout
history – the history of the root node
depth – current planning horizon
max_rollout_depth – max rollout depth
t – the time step

Returns

the estimated value of the root node

simulate(state: int, max_rollout_depth: int, c: float, history: List[int], t: int, max_planning_depth: int, depth=0, parent: Union[None, csle_agents.agents.pomcp.belief_node.BeliefNode, csle_agents.agents.pomcp.action_node.ActionNode] = None) → Tuple[float, int][source]

Performs the POMCP simulation starting from a given belief node and a sampled state

Parameters

state – the sampled state from the belief state of the node
max_rollout_depth – the maximum depth of rollout
max_planning_depth – the maximum depth of planning
c – the weighting factor for the ucb acquisition function
depth – the current depth of the simulation
history – the current history (history of the start node plus the simulation history)
parent – the parent node in the tree
t – the time-step

Returns

the Monte-Carlo value of the node and the current depth

solve(max_rollout_depth: int, max_planning_depth: int, t: int) → None[source]

Runs the POMCP algorithm with a given max depth for the lookahead

Parameters

max_rollout_depth – the max depth for rollout
max_planning_depth – the max depth for planning

Returns

None

update_tree_with_new_samples(action_sequence: List[int], observation: int, t: int) → List[Any][source]

Updates the tree after an action has been selected and a new observation been received

Parameters

action_sequence – the action sequence that was executed
observation – the observation that was received
t – the time-step

Returns

the updated particle state

csle_agents.agents.pomcp.pomcp_acquisition_function_type module

class csle_agents.agents.pomcp.pomcp_acquisition_function_type.POMCPAcquisitionFunctionType(value)[source]

Bases: enum.IntEnum

Enum representing the different types of acquisition functions in POMCP

ALPHA_GO = 1

UCB = 0

csle_agents.agents.pomcp.pomcp_agent module

class csle_agents.agents.pomcp.pomcp_agent.POMCPAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, env: Optional[csle_common.dao.simulation_config.base_env.BaseEnv] = None, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

POMCP Agent

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) → Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters: metrics – the dict with the aggregated metrics
Returns: the average metrics

hparam_names() → List[str][source]

Returns: a list with the hyperparameter names

pomcp(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) → csle_common.dao.training.experiment_result.ExperimentResult[source]

Runs the POMCP algorithm

Parameters

exp_result – the experiment result object to store the result
seed – the seed
training_job – the training job config
random_seeds – list of seeds

Returns

the updated experiment result and the trained policy

train() → csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Performs the policy training for the given random seeds using POMCP

Returns: the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) → Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters

metrics – the dict with the aggregated metrics
info – the new information

Returns

the updated dict

csle_agents.agents.pomcp.pomcp_util module

class csle_agents.agents.pomcp.pomcp_util.POMCPUtil[source]

Bases: object

Class with utility functions related to POMCP

static alpha_go_acquisition_function(action: csle_agents.agents.pomcp.node.Node, c: float, c2: float, prior: float, prior_weight: float) → float[source]

The UCB acquisition function

Parameters

action – the action node
c – the exploration parameter
c2 – the c2 parameter
prior – the prior weight
prior_weight – the weight to put on the prior

Returns

the acquisition value of the action

static convert_samples_to_distribution(samples) → Dict[int, float][source]

Converts a list of samples to a probability distribution

Parameters: samples – the list of samples
Returns: a dict with the sample values and their probabilities

static get_default_value(particles: List[int], action: int, default_value: float, env: csle_common.dao.simulation_config.base_env.BaseEnv, value_function: Callable[[Any], float]) → float[source]

Gets the default value of a node

Parameters

particles – the particles of the parent node
action – the action of the node
default_value – the default value
env – the black-box simulator
value_function – the value function

Returns

the value

static rand_choice(candidates: List[Any]) → Any[source]

Selects an element from a given list uniformly at random

Parameters: candidates – the list to sample from
Returns: the sample

static sample_from_distribution(probability_vector: List[float]) → int[source]

Utility function to sample from a probability vector

Parameters: probability_vector – the probability vector to sample from
Returns: the sampled element

static trajectory_simulation_particles(o: int, env: csle_common.dao.simulation_config.base_env.BaseEnv, action_sequence: List[int], num_particles: int, verbose: bool = False) → List[int][source]

Performs trajectory simulations to find possible states matching to the given observation

Parameters

o – the observation to match against
env – the black-box simulator to sue for generating trajectories
action_sequence – the action sequence for the trajectory
num_particles – the number of particles to collect
verbose – boolean flag indicating whether logging should be verbose or not

Returns

the list of particles matching the given observation

static ucb(history_visit_count, action_visit_count)[source]

Implements the upper-confidence-bound acquisiton function

Parameters

history_visit_count – counter of the number of times the history has been visited
action_visit_count – counter of the number of times the action has been taken in the history

Returns

the ucb acquisition value

static ucb_acquisition_function(action: csle_agents.agents.pomcp.node.Node, c: float) → float[source]

The UCB acquisition function

Parameters

action – the action node
c – the exploration parameter
rollout_policy – the rollout policy
prior_weight – the weight to put on the prior

Returns

the acquisition value of the action

csle_agents.agents.pomcp package

Submodules

csle_agents.agents.pomcp.action_node module

csle_agents.agents.pomcp.belief_node module

csle_agents.agents.pomcp.belief_tree module

csle_agents.agents.pomcp.node module

csle_agents.agents.pomcp.pomcp module

csle_agents.agents.pomcp.pomcp_acquisition_function_type module

csle_agents.agents.pomcp.pomcp_agent module

csle_agents.agents.pomcp.pomcp_util module

Module contents