csle_agents.agents.vi package
Submodules
csle_agents.agents.vi.vi_agent module
- class csle_agents.agents.vi.vi_agent.VIAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True, env: Optional[csle_common.dao.simulation_config.base_env.BaseEnv] = None)[source]
Bases:
csle_agents.agents.base.base_agent.BaseAgent
Value Iteration Agent
- create_policy_from_value_function(num_states: int, num_actions: int, V: numpy.ndarray[Any, numpy.dtype[Any]], T: numpy.ndarray[Any, numpy.dtype[Any]], discount_factor: float, R: numpy.ndarray[Any, numpy.dtype[Any]]) numpy.ndarray[Any, numpy.dtype[Any]] [source]
Creates a tabular policy from a value function
- Parameters
num_states – the number of states
num_actions – the number of actions
V – the value function
T – the transition operator
discount_factor – the discount factor
R – the reward function
- Returns
the tabular policy
- evaluate_policy(policy: numpy.ndarray[Any, numpy.dtype[Any]], eval_batch_size: int) float [source]
Evalutes a tabular policy
- Parameters
policy – the tabular policy to evaluate
eval_batch_size – the batch size
- Returns
None
- one_step_lookahead(state, V, num_actions, num_states, T, discount_factor, R) numpy.ndarray[Any, numpy.dtype[Any]] [source]
Performs a one-step lookahead for value iteration :param state: the current state :param V: the current value function :param num_actions: the number of actions :param num_states: the number of states :param T: the transition kernel :param discount_factor: the discount factor :param R: the table with rewards :param next_state_lookahead: the next state lookahead table :return: an array with lookahead values
- train() csle_common.dao.training.experiment_execution.ExperimentExecution [source]
Runs the value iteration algorithm to compute V*
- Returns
the results
- value_iteration(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int) csle_common.dao.training.experiment_result.ExperimentResult [source]
Runs the value iteration algorithm
- Parameters
exp_result – the experiment result object
seed – the random seed
- Returns
the updated experiment result
- vi(T: numpy.ndarray[Any, numpy.dtype[Any]], num_states: int, num_actions: int, R: numpy.ndarray[Any, numpy.dtype[Any]], theta=0.0001, discount_factor=1.0) Tuple[numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], List[Any], List[Any], List[Any]] [source]
An implementation of the Value Iteration algorithm :param T: the transition kernel T :param num_states: the number of states :param num_actions: the number of actions :param state_to_id: the state-to-id lookup table :param HP: the table with hack probabilities :param R: the table with rewards :param next_state_lookahead: the next-state-lookahead table :param theta: convergence threshold :param discount_factor: the discount factor :return: (greedy policy, value function, deltas, average_returns)