csle_agents.agents.pomcp package
csle_agents.agents.pomcp.action_node module
- class csle_agents.agents.pomcp.action_node.ActionNode(id: int, history: List[int], action: int, parent=None, value: float = -2000, visit_count: int = 0)[source]
A node in the POMCP history tree where the last element of the history was an action
- add_child(node: Node) None [source]
Adds a child node to the tree. Since an action is always followed by an observation in the history, the next node will be an observation/belief node
- Parameters
node – the new child node to add
- Returns
csle_agents.agents.pomcp.belief_node module
- class csle_agents.agents.pomcp.belief_node.BeliefNode(id: int, history: List[int], observation: int, parent=None, value: float = -2000, visit_count: int = 0)[source]
Represents a node that holds the belief distribution given its history sequence in a belief tree. It also holds the received observation after which the belief is updated accordingly
- add_child(node: Node) None [source]
Adds a child node to this node. Since an observation is always followed by an action in the history, the next node will be an action node
- Parameters
node – the child action node
- Returns
- add_particle(particle: Union[int, List[int]]) None [source]
Adds a paticle (a sample state) to the list of particles
- Parameters
particle – the particle to add
- Returns
csle_agents.agents.pomcp.belief_tree module
- class csle_agents.agents.pomcp.belief_tree.BeliefTree(root_particles: List[int], default_node_value: float, root_observation: int, initial_visit_count: int = 0)[source]
The belief tree of POMCP. Each node in the tree corresponds to a history of the POMDP, where a history is a sequence of actions and observations.
- add(history: List[int], parent: Optional[Union[Node, ActionNode, BeliefNode]], action: Optional[int] = None, observation: Optional[int] = None, particle: Optional[Any] = None, value: float = 0, initial_visit_count: int = 0) Node [source]
Creates and adds a new belief node or action node to the belief search tree
- Parameters
h – history sequence
parent – either ActionNode or BeliefNode
action – action
observation – observation
particle – new node’s particle set
cost – action cost of an action node
value – the value of the node
initial_visit_count – the initial visit count
- Returns
The newly added node
- find_or_create(history: List[int], parent: Union[None, BeliefNode, ActionNode], observation: int, initial_value: float, initial_visit_count: int) Node [source]
Search for the node that corresponds to given history, otherwise create one using given params
- Parameters
history – the current history
parent – the parent of the node
observation – the latest observation
initial_value – the initial value of a created node
initial_visit_count – the initial visit count of a created node
- Returns
the new node
csle_agents.agents.pomcp.node module
- class csle_agents.agents.pomcp.node.Node(id: int, history: List[int], parent=None, value: float = -2000, visit_count: int = 0, observation: int = -1, action: int = -1)[source]
Abstract node type, represents a node in the lookahead tree
csle_agents.agents.pomcp.pomcp module
- class csle_agents.agents.pomcp.pomcp.POMCP(A: List[int], gamma: float, env: BaseEnv, c: float, initial_particles: List[Any], planning_time: float = 0.5, max_particles: int = 350, reinvigoration: bool = False, reinvigorated_particles_ratio: float = 0.1, rollout_policy: Optional[Policy] = None, value_function: Optional[Callable[[Any], float]] = None, verbose: bool = False, default_node_value: float = 0, prior_weight: float = 1.0, prior_confidence: int = 0, acquisition_function_type: POMCPAcquisitionFunctionType = POMCPAcquisitionFunctionType.UCB, c2: float = 1, use_rollout_policy: bool = False, prune_action_space: bool = False, prune_size: int = 3)[source]
Class that implements the POMCP algorithm
- compute_belief() Dict[int, float] [source]
Computes the belief state based on the particles
- Returns
the belief state
- get_action() int [source]
Gets the next action to execute based on the state of the tree. Selects the action with the highest value from the root node.
- Returns
the next action
- rollout(state: int, history: List[int], depth: int, max_rollout_depth: int, t: int) float [source]
Perform randomized recursive rollout search starting from the given history until the max depth has been achieved
- Parameters
state – the initial state of the rollout
history – the history of the root node
depth – current planning horizon
max_rollout_depth – max rollout depth
t – the time step
- Returns
the estimated value of the root node
- simulate(state: int, max_rollout_depth: int, c: float, history: List[int], t: int, max_planning_depth: int, depth=0, parent: Union[None, BeliefNode, ActionNode] = None) Tuple[float, int] [source]
Performs the POMCP simulation starting from a given belief node and a sampled state
- Parameters
state – the sampled state from the belief state of the node
max_rollout_depth – the maximum depth of rollout
max_planning_depth – the maximum depth of planning
c – the weighting factor for the ucb acquisition function
depth – the current depth of the simulation
history – the current history (history of the start node plus the simulation history)
parent – the parent node in the tree
t – the time-step
- Returns
the Monte-Carlo value of the node and the current depth
- solve(max_rollout_depth: int, max_planning_depth: int, t: int) None [source]
Runs the POMCP algorithm with a given max depth for the lookahead
- Parameters
max_rollout_depth – the max depth for rollout
max_planning_depth – the max depth for planning
- Returns
- update_tree_with_new_samples(action_sequence: List[int], observation: int, t: int) List[Any] [source]
Updates the tree after an action has been selected and a new observation been received
- Parameters
action_sequence – the action sequence that was executed
observation – the observation that was received
t – the time-step
- Returns
the updated particle state
csle_agents.agents.pomcp.pomcp_acquisition_function_type module
csle_agents.agents.pomcp.pomcp_agent module
- class csle_agents.agents.pomcp.pomcp_agent.POMCPAgent(simulation_env_config: SimulationEnvConfig, emulation_env_config: Union[None, EmulationEnvConfig], experiment_config: ExperimentConfig, env: Optional[BaseEnv] = None, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True)[source]
- static compute_avg_metrics(metrics: Dict[str, List[Union[float, int]]]) Dict[str, Union[float, int]] [source]
Computes the average metrics of a dict with aggregated metrics
- Parameters
metrics – the dict with the aggregated metrics
- Returns
the average metrics
- pomcp(exp_result: ExperimentResult, seed: int, training_job: TrainingJobConfig, random_seeds: List[int]) ExperimentResult [source]
Runs the POMCP algorithm
- Parameters
exp_result – the experiment result object to store the result
seed – the seed
training_job – the training job config
random_seeds – list of seeds
- Returns
the updated experiment result and the trained policy
- train() ExperimentExecution [source]
Performs the policy training for the given random seeds using POMCP
- Returns
the training metrics and the trained policies
- static update_metrics(metrics: Dict[str, List[Union[float, int]]], info: Dict[str, Union[float, int]]) Dict[str, List[Union[float, int]]] [source]
Update a dict with aggregated metrics using new information from the environment
- Parameters
metrics – the dict with the aggregated metrics
info – the new information
- Returns
the updated dict
csle_agents.agents.pomcp.pomcp_util module
- class csle_agents.agents.pomcp.pomcp_util.POMCPUtil[source]
Class with utility functions related to POMCP
- static alpha_go_acquisition_function(action: Node, c: float, c2: float, prior: float, prior_weight: float) float [source]
The UCB acquisition function
- Parameters
action – the action node
c – the exploration parameter
c2 – the c2 parameter
prior – the prior weight
prior_weight – the weight to put on the prior
- Returns
the acquisition value of the action
- static convert_samples_to_distribution(samples) Dict[int, float] [source]
Converts a list of samples to a probability distribution
- Parameters
samples – the list of samples
- Returns
a dict with the sample values and their probabilities
- static get_default_value(particles: List[int], action: int, default_value: float, env: BaseEnv, value_function: Callable[[Any], float]) float [source]
Gets the default value of a node
- Parameters
particles – the particles of the parent node
action – the action of the node
default_value – the default value
env – the black-box simulator
value_function – the value function
- Returns
the value
- static rand_choice(candidates: List[Any]) Any [source]
Selects an element from a given list uniformly at random
- Parameters
candidates – the list to sample from
- Returns
the sample
- static sample_from_distribution(probability_vector: List[float]) int [source]
Utility function to sample from a probability vector
- Parameters
probability_vector – the probability vector to sample from
- Returns
the sampled element
- static trajectory_simulation_particles(o: int, env: BaseEnv, action_sequence: List[int], num_particles: int, verbose: bool = False) List[int] [source]
Performs trajectory simulations to find possible states matching to the given observation
- Parameters
o – the observation to match against
env – the black-box simulator to sue for generating trajectories
action_sequence – the action sequence for the trajectory
num_particles – the number of particles to collect
verbose – boolean flag indicating whether logging should be verbose or not
- Returns
the list of particles matching the given observation
- static ucb(history_visit_count, action_visit_count)[source]
Implements the upper-confidence-bound acquisiton function
- Parameters
history_visit_count – counter of the number of times the history has been visited
action_visit_count – counter of the number of times the action has been taken in the history
- Returns
the ucb acquisition value