csle_agents.agents.pomcp package

Submodules

csle_agents.agents.pomcp.action_node module

class csle_agents.agents.pomcp.action_node.ActionNode(id: int, history: List[int], action: int, parent=None, value: float = - 2000, visit_count: int = 0)[source]

Bases: csle_agents.agents.pomcp.node.Node

A node in the POMCP history tree where the last element of the history was an action

add_child(node: csle_agents.agents.pomcp.node.Node) None[source]

Adds a child node to the tree. Since an action is always followed by an observation in the history, the next node will be an observation/belief node

Parameters

node – the new child node to add

Returns

None

get_child(key: int) Union[None, csle_agents.agents.pomcp.node.Node][source]

Gets the child node corresponding to a specific observation

Parameters

key – the observation to get the node for

Returns

the child node or None if it was not found

update_stats(immediate_reward: float) None[source]

Updates the mean return from the node by computing the rolling average

Parameters

immediate_reward – the latest reward sample

Returns

None

csle_agents.agents.pomcp.belief_node module

class csle_agents.agents.pomcp.belief_node.BeliefNode(id: int, history: List[int], observation: int, parent=None, value: float = - 2000, visit_count: int = 0)[source]

Bases: csle_agents.agents.pomcp.node.Node

Represents a node that holds the belief distribution given its history sequence in a belief tree. It also holds the received observation after which the belief is updated accordingly

add_child(node: csle_agents.agents.pomcp.node.Node) None[source]

Adds a child node to this node. Since an observation is always followed by an action in the history, the next node will be an action node

Parameters

node – the child action node

Returns

None

add_particle(particle: Union[int, List[int]]) None[source]

Adds a paticle (a sample state) to the list of particles

Parameters

particle – the particle to add

Returns

None

get_child(key: int) Optional[csle_agents.agents.pomcp.node.Node][source]

Gets the child node corresponding to a specific action

Parameters

key – the action to get the node for

Returns

the node or None if it was not found

sample_state() Any[source]

Samples a state from the belief state

Returns

the sampled state

csle_agents.agents.pomcp.belief_tree module

class csle_agents.agents.pomcp.belief_tree.BeliefTree(root_particles: List[int], default_node_value: float, root_observation: int, initial_visit_count: int = 0)[source]

Bases: object

The belief tree of POMCP. Each node in the tree corresponds to a history of the POMDP, where a history is a sequence of actions and observations.

add(history: List[int], parent: Optional[Union[csle_agents.agents.pomcp.node.Node, csle_agents.agents.pomcp.action_node.ActionNode, csle_agents.agents.pomcp.belief_node.BeliefNode]], action: Optional[int] = None, observation: Optional[int] = None, particle: Optional[Any] = None, value: float = 0, initial_visit_count: int = 0) csle_agents.agents.pomcp.node.Node[source]

Creates and adds a new belief node or action node to the belief search tree

Parameters
  • h – history sequence

  • parent – either ActionNode or BeliefNode

  • action – action

  • observation – observation

  • particle – new node’s particle set

  • cost – action cost of an action node

  • value – the value of the node

  • initial_visit_count – the initial visit count

Returns

The newly added node

find_or_create(history: List[int], parent: Union[None, csle_agents.agents.pomcp.belief_node.BeliefNode, csle_agents.agents.pomcp.action_node.ActionNode], observation: int, initial_value: float, initial_visit_count: int) csle_agents.agents.pomcp.node.Node[source]

Search for the node that corresponds to given history, otherwise create one using given params

Parameters
  • history – the current history

  • parent – the parent of the node

  • observation – the latest observation

  • initial_value – the initial value of a created node

  • initial_visit_count – the initial visit count of a created node

Returns

the new node

prune(node, exclude=None)[source]

Removes the entire subtree subscribed to ‘node’ with exceptions. :param node: root of the subtree to be removed :param exclude: exception component :return:

csle_agents.agents.pomcp.node module

class csle_agents.agents.pomcp.node.Node(id: int, history: List[int], parent=None, value: float = - 2000, visit_count: int = 0, observation: int = - 1, action: int = - 1)[source]

Bases: object

Abstract node type, represents a node in the lookahead tree

abstract add_child(node: csle_agents.agents.pomcp.node.Node) None[source]

Method that adds a child to the node. Should be implemented by classes that inherit from this class

Parameters

node – the node to add

Returns

None

abstract get_child(key: int) Optional[csle_agents.agents.pomcp.node.Node][source]

Method that gets the child to the node. Should be implemented by classes that inherit from this class

Parameters

key – the key to identify the child

Returns

the child

csle_agents.agents.pomcp.pomcp module

class csle_agents.agents.pomcp.pomcp.POMCP(A: List[int], gamma: float, env: csle_common.dao.simulation_config.base_env.BaseEnv, c: float, initial_particles: List[Any], planning_time: float = 0.5, max_particles: int = 350, reinvigoration: bool = False, reinvigorated_particles_ratio: float = 0.1, rollout_policy: Optional[csle_common.dao.training.policy.Policy] = None, value_function: Optional[Callable[[Any], float]] = None, verbose: bool = False, default_node_value: float = 0, prior_weight: float = 1.0, prior_confidence: int = 0, acquisition_function_type: csle_agents.agents.pomcp.pomcp_acquisition_function_type.POMCPAcquisitionFunctionType = POMCPAcquisitionFunctionType.UCB, c2: float = 1, use_rollout_policy: bool = False, prune_action_space: bool = False, prune_size: int = 3)[source]

Bases: object

Class that implements the POMCP algorithm

compute_belief() Dict[int, float][source]

Computes the belief state based on the particles

Returns

the belief state

get_action() int[source]

Gets the next action to execute based on the state of the tree. Selects the action with the highest value from the root node.

Returns

the next action

rollout(state: int, history: List[int], depth: int, max_rollout_depth: int, t: int) float[source]

Perform randomized recursive rollout search starting from the given history until the max depth has been achieved

Parameters
  • state – the initial state of the rollout

  • history – the history of the root node

  • depth – current planning horizon

  • max_rollout_depth – max rollout depth

  • t – the time step

Returns

the estimated value of the root node

simulate(state: int, max_rollout_depth: int, c: float, history: List[int], t: int, max_planning_depth: int, depth=0, parent: Union[None, csle_agents.agents.pomcp.belief_node.BeliefNode, csle_agents.agents.pomcp.action_node.ActionNode] = None) Tuple[float, int][source]

Performs the POMCP simulation starting from a given belief node and a sampled state

Parameters
  • state – the sampled state from the belief state of the node

  • max_rollout_depth – the maximum depth of rollout

  • max_planning_depth – the maximum depth of planning

  • c – the weighting factor for the ucb acquisition function

  • depth – the current depth of the simulation

  • history – the current history (history of the start node plus the simulation history)

  • parent – the parent node in the tree

  • t – the time-step

Returns

the Monte-Carlo value of the node and the current depth

solve(max_rollout_depth: int, max_planning_depth: int, t: int) None[source]

Runs the POMCP algorithm with a given max depth for the lookahead

Parameters
  • max_rollout_depth – the max depth for rollout

  • max_planning_depth – the max depth for planning

Returns

None

update_tree_with_new_samples(action_sequence: List[int], observation: int, t: int) List[Any][source]

Updates the tree after an action has been selected and a new observation been received

Parameters
  • action_sequence – the action sequence that was executed

  • observation – the observation that was received

  • t – the time-step

Returns

the updated particle state

csle_agents.agents.pomcp.pomcp_acquisition_function_type module

class csle_agents.agents.pomcp.pomcp_acquisition_function_type.POMCPAcquisitionFunctionType(value)[source]

Bases: enum.IntEnum

Enum representing the different types of acquisition functions in POMCP

ALPHA_GO = 1
UCB = 0

csle_agents.agents.pomcp.pomcp_agent module

class csle_agents.agents.pomcp.pomcp_agent.POMCPAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, emulation_env_config: Union[None, csle_common.dao.emulation_config.emulation_env_config.EmulationEnvConfig], experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, env: Optional[csle_common.dao.simulation_config.base_env.BaseEnv] = None, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

POMCP Agent

static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]][source]

Computes the average metrics of a dict with aggregated metrics

Parameters

metrics – the dict with the aggregated metrics

Returns

the average metrics

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

pomcp(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int, training_job: csle_common.dao.jobs.training_job_config.TrainingJobConfig, random_seeds: List[int]) csle_common.dao.training.experiment_result.ExperimentResult[source]

Runs the POMCP algorithm

Parameters
  • exp_result – the experiment result object to store the result

  • seed – the seed

  • training_job – the training job config

  • random_seeds – list of seeds

Returns

the updated experiment result and the trained policy

train() csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Performs the policy training for the given random seeds using POMCP

Returns

the training metrics and the trained policies

static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]][source]

Update a dict with aggregated metrics using new information from the environment

Parameters
  • metrics – the dict with the aggregated metrics

  • info – the new information

Returns

the updated dict

csle_agents.agents.pomcp.pomcp_util module

class csle_agents.agents.pomcp.pomcp_util.POMCPUtil[source]

Bases: object

Class with utility functions related to POMCP

static alpha_go_acquisition_function(action: csle_agents.agents.pomcp.node.Node, c: float, c2: float, prior: float, prior_weight: float) float[source]

The UCB acquisition function

Parameters
  • action – the action node

  • c – the exploration parameter

  • c2 – the c2 parameter

  • prior – the prior weight

  • prior_weight – the weight to put on the prior

Returns

the acquisition value of the action

static convert_samples_to_distribution(samples) Dict[int, float][source]

Converts a list of samples to a probability distribution

Parameters

samples – the list of samples

Returns

a dict with the sample values and their probabilities

static get_default_value(particles: List[int], action: int, default_value: float, env: csle_common.dao.simulation_config.base_env.BaseEnv, value_function: Callable[[Any], float]) float[source]

Gets the default value of a node

Parameters
  • particles – the particles of the parent node

  • action – the action of the node

  • default_value – the default value

  • env – the black-box simulator

  • value_function – the value function

Returns

the value

static rand_choice(candidates: List[Any]) Any[source]

Selects an element from a given list uniformly at random

Parameters

candidates – the list to sample from

Returns

the sample

static sample_from_distribution(probability_vector: List[float]) int[source]

Utility function to sample from a probability vector

Parameters

probability_vector – the probability vector to sample from

Returns

the sampled element

static trajectory_simulation_particles(o: int, env: csle_common.dao.simulation_config.base_env.BaseEnv, action_sequence: List[int], num_particles: int, verbose: bool = False) List[int][source]

Performs trajectory simulations to find possible states matching to the given observation

Parameters
  • o – the observation to match against

  • env – the black-box simulator to sue for generating trajectories

  • action_sequence – the action sequence for the trajectory

  • num_particles – the number of particles to collect

  • verbose – boolean flag indicating whether logging should be verbose or not

Returns

the list of particles matching the given observation

static ucb(history_visit_count, action_visit_count)[source]

Implements the upper-confidence-bound acquisiton function

Parameters
  • history_visit_count – counter of the number of times the history has been visited

  • action_visit_count – counter of the number of times the action has been taken in the history

Returns

the ucb acquisition value

static ucb_acquisition_function(action: csle_agents.agents.pomcp.node.Node, c: float) float[source]

The UCB acquisition function

Parameters
  • action – the action node

  • c – the exploration parameter

  • rollout_policy – the rollout policy

  • prior_weight – the weight to put on the prior

Returns

the acquisition value of the action

Module contents