csle_agents.agents.vi package
Submodules
csle_agents.agents.vi.vi_agent module
- class csle_agents.agents.vi.vi_agent.VIAgent(simulation_env_config: SimulationEnvConfig, experiment_config: ExperimentConfig, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True, env: Optional[BaseEnv] = None)[source]
Bases:
BaseAgent
Value Iteration Agent
- create_policy_from_value_function(num_states: int, num_actions: int, V: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]], discount_factor: float, R: ndarray[Any, dtype[Any]]) ndarray[Any, dtype[Any]] [source]
Creates a tabular policy from a value function
- Parameters
num_states – the number of states
num_actions – the number of actions
V – the value function
T – the transition operator
discount_factor – the discount factor
R – the reward function
- Returns
the tabular policy
- evaluate_policy(policy: ndarray[Any, dtype[Any]], eval_batch_size: int) float [source]
Evalutes a tabular policy
- Parameters
policy – the tabular policy to evaluate
eval_batch_size – the batch size
- Returns
None
- one_step_lookahead(state, V, num_actions, num_states, T, discount_factor, R) ndarray[Any, dtype[Any]] [source]
Performs a one-step lookahead for value iteration :param state: the current state :param V: the current value function :param num_actions: the number of actions :param num_states: the number of states :param T: the transition kernel :param discount_factor: the discount factor :param R: the table with rewards :param next_state_lookahead: the next state lookahead table :return: an array with lookahead values
- train() ExperimentExecution [source]
Runs the value iteration algorithm to compute V*
- Returns
the results
- value_iteration(exp_result: ExperimentResult, seed: int) ExperimentResult [source]
Runs the value iteration algorithm
- Parameters
exp_result – the experiment result object
seed – the random seed
- Returns
the updated experiment result
- vi(T: ndarray[Any, dtype[Any]], num_states: int, num_actions: int, R: ndarray[Any, dtype[Any]], theta=0.0001, discount_factor=1.0) Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], List[Any], List[Any], List[Any]] [source]
An implementation of the Value Iteration algorithm :param T: the transition kernel T :param num_states: the number of states :param num_actions: the number of actions :param state_to_id: the state-to-id lookup table :param HP: the table with hack probabilities :param R: the table with rewards :param next_state_lookahead: the next-state-lookahead table :param theta: convergence threshold :param discount_factor: the discount factor :return: (greedy policy, value function, deltas, average_returns)