csle_agents.agents.vi package

Submodules

csle_agents.agents.vi.vi_agent module

class csle_agents.agents.vi.vi_agent.VIAgent(simulation_env_config: csle_common.dao.simulation_config.simulation_env_config.SimulationEnvConfig, experiment_config: csle_common.dao.training.experiment_config.ExperimentConfig, training_job: Optional[csle_common.dao.jobs.training_job_config.TrainingJobConfig] = None, save_to_metastore: bool = True, env: Optional[csle_common.dao.simulation_config.base_env.BaseEnv] = None)[source]

Bases: csle_agents.agents.base.base_agent.BaseAgent

Value Iteration Agent

create_policy_from_value_function(num_states: int, num_actions: int, V: numpy.ndarray[Any, numpy.dtype[Any]], T: numpy.ndarray[Any, numpy.dtype[Any]], discount_factor: float, R: numpy.ndarray[Any, numpy.dtype[Any]]) numpy.ndarray[Any, numpy.dtype[Any]][source]

Creates a tabular policy from a value function

Parameters
  • num_states – the number of states

  • num_actions – the number of actions

  • V – the value function

  • T – the transition operator

  • discount_factor – the discount factor

  • R – the reward function

Returns

the tabular policy

evaluate_policy(policy: numpy.ndarray[Any, numpy.dtype[Any]], eval_batch_size: int) float[source]

Evalutes a tabular policy

Parameters
  • policy – the tabular policy to evaluate

  • eval_batch_size – the batch size

Returns

None

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

one_step_lookahead(state, V, num_actions, num_states, T, discount_factor, R) numpy.ndarray[Any, numpy.dtype[Any]][source]

Performs a one-step lookahead for value iteration :param state: the current state :param V: the current value function :param num_actions: the number of actions :param num_states: the number of states :param T: the transition kernel :param discount_factor: the discount factor :param R: the table with rewards :param next_state_lookahead: the next state lookahead table :return: an array with lookahead values

train() csle_common.dao.training.experiment_execution.ExperimentExecution[source]

Runs the value iteration algorithm to compute V*

Returns

the results

value_iteration(exp_result: csle_common.dao.training.experiment_result.ExperimentResult, seed: int) csle_common.dao.training.experiment_result.ExperimentResult[source]

Runs the value iteration algorithm

Parameters
  • exp_result – the experiment result object

  • seed – the random seed

Returns

the updated experiment result

vi(T: numpy.ndarray[Any, numpy.dtype[Any]], num_states: int, num_actions: int, R: numpy.ndarray[Any, numpy.dtype[Any]], theta=0.0001, discount_factor=1.0) Tuple[numpy.ndarray[Any, numpy.dtype[Any]], numpy.ndarray[Any, numpy.dtype[Any]], List[Any], List[Any], List[Any]][source]

An implementation of the Value Iteration algorithm :param T: the transition kernel T :param num_states: the number of states :param num_actions: the number of actions :param state_to_id: the state-to-id lookup table :param HP: the table with hack probabilities :param R: the table with rewards :param next_state_lookahead: the next-state-lookahead table :param theta: convergence threshold :param discount_factor: the discount factor :return: (greedy policy, value function, deltas, average_returns)

Module contents