csle_agents.agents.vi package

Submodules

csle_agents.agents.vi.vi_agent module

class csle_agents.agents.vi.vi_agent.VIAgent(simulation_env_config: SimulationEnvConfig, experiment_config: ExperimentConfig, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True, env: Optional[BaseEnv] = None)[source]

Bases: BaseAgent

Value Iteration Agent

create_policy_from_value_function(num_states: int, num_actions: int, V: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]], discount_factor: float, R: ndarray[Any, dtype[Any]]) ndarray[Any, dtype[Any]][source]

Creates a tabular policy from a value function

Parameters
  • num_states – the number of states

  • num_actions – the number of actions

  • V – the value function

  • T – the transition operator

  • discount_factor – the discount factor

  • R – the reward function

Returns

the tabular policy

evaluate_policy(policy: ndarray[Any, dtype[Any]], eval_batch_size: int) float[source]

Evalutes a tabular policy

Parameters
  • policy – the tabular policy to evaluate

  • eval_batch_size – the batch size

Returns

None

hparam_names() List[str][source]
Returns

a list with the hyperparameter names

one_step_lookahead(state, V, num_actions, num_states, T, discount_factor, R) ndarray[Any, dtype[Any]][source]

Performs a one-step lookahead for value iteration :param state: the current state :param V: the current value function :param num_actions: the number of actions :param num_states: the number of states :param T: the transition kernel :param discount_factor: the discount factor :param R: the table with rewards :param next_state_lookahead: the next state lookahead table :return: an array with lookahead values

train() ExperimentExecution[source]

Runs the value iteration algorithm to compute V*

Returns

the results

value_iteration(exp_result: ExperimentResult, seed: int) ExperimentResult[source]

Runs the value iteration algorithm

Parameters
  • exp_result – the experiment result object

  • seed – the random seed

Returns

the updated experiment result

vi(T: ndarray[Any, dtype[Any]], num_states: int, num_actions: int, R: ndarray[Any, dtype[Any]], theta=0.0001, discount_factor=1.0) Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], List[Any], List[Any], List[Any]][source]

An implementation of the Value Iteration algorithm :param T: the transition kernel T :param num_states: the number of states :param num_actions: the number of actions :param state_to_id: the state-to-id lookup table :param HP: the table with hack probabilities :param R: the table with rewards :param next_state_lookahead: the next-state-lookahead table :param theta: convergence threshold :param discount_factor: the discount factor :return: (greedy policy, value function, deltas, average_returns)

Module contents