csle_agents.agents.vi package

Submodules

csle_agents.agents.vi.vi_agent module

class csle_agents.agents.vi.vi_agent.VIAgent(simulation_env_config: SimulationEnvConfig, experiment_config: ExperimentConfig, training_job: Optional[TrainingJobConfig] = None, save_to_metastore: bool = True, env: Optional[BaseEnv] = None)[source]

Bases: BaseAgent

Value Iteration Agent

create_policy_from_value_function(num_states: int, num_actions: int, V: ndarray[Any, dtype[Any]], T: ndarray[Any, dtype[Any]], discount_factor: float, R: ndarray[Any, dtype[Any]]) → ndarray[Any, dtype[Any]][source]

Creates a tabular policy from a value function

Parameters

num_states – the number of states
num_actions – the number of actions
V – the value function
T – the transition operator
discount_factor – the discount factor
R – the reward function

Returns

the tabular policy

evaluate_policy(policy: ndarray[Any, dtype[Any]], eval_batch_size: int) → float[source]

Evalutes a tabular policy

Parameters

policy – the tabular policy to evaluate
eval_batch_size – the batch size

Returns

None

hparam_names() → List[str][source]

Returns: a list with the hyperparameter names

one_step_lookahead(state, V, num_actions, num_states, T, discount_factor, R) → ndarray[Any, dtype[Any]][source]: Performs a one-step lookahead for value iteration :param state: the current state :param V: the current value function :param num_actions: the number of actions :param num_states: the number of states :param T: the transition kernel :param discount_factor: the discount factor :param R: the table with rewards :param next_state_lookahead: the next state lookahead table :return: an array with lookahead values

train() → ExperimentExecution[source]

Runs the value iteration algorithm to compute V*

Returns: the results

value_iteration(exp_result: ExperimentResult, seed: int) → ExperimentResult[source]

Runs the value iteration algorithm

Parameters

exp_result – the experiment result object
seed – the random seed

Returns

the updated experiment result

vi(T: ndarray[Any, dtype[Any]], num_states: int, num_actions: int, R: ndarray[Any, dtype[Any]], theta=0.0001, discount_factor=1.0) → Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], List[Any], List[Any], List[Any]][source]: An implementation of the Value Iteration algorithm :param T: the transition kernel T :param num_states: the number of states :param num_actions: the number of actions :param state_to_id: the state-to-id lookup table :param HP: the table with hack probabilities :param R: the table with rewards :param next_state_lookahead: the next-state-lookahead table :param theta: convergence threshold :param discount_factor: the discount factor :return: (greedy policy, value function, deltas, average_returns)

csle_agents.agents.vi package

Submodules

csle_agents.agents.vi.vi_agent module

Module contents