algorithms.policy package¶

Submodules¶

algorithms.policy.GRUCell module¶

class algorithms.policy.GRUCell.GRUCell(input_size, hidden_size)[source]¶

Bases: torch.nn.modules.module.Module

forward(x, h=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

algorithms.policy.GRUNetwork module¶

class algorithms.policy.GRUNetwork.GRUNetwork(input_dim, output_dim, hidden_dim, gru_layer=<class 'algorithms.policy.GRUCell.GRUCell'>, output_nonlinearity=None)[source]¶

Bases: torch.nn.modules.module.Module

forward(x, h=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

algorithms.policy.GaussianGRUPolicy module¶

class algorithms.policy.GaussianGRUPolicy.GaussianGRUPolicy(env_spec, hidden_dim=32, feature_network=None, state_include_action=True, gru_layer=<class 'algorithms.policy.GRUCell.GRUCell'>, output_nonlinearity=None, mode: int = 0, log_std=0, cuda_enable=True)[source]¶

Bases: torch.nn.modules.module.Module

property action_space¶

dist_info_sym(obs_var, state_info_vars)[source]¶

property distribution¶

forward(x, h=None)[source]¶

Parameters

x – input feature
h – hidden layer

Returns

output mean, log std for action and hidden layer for next round

get_action(observation)[source]¶

Parameters: observation – input observation
Returns: get actions from the given observation

get_actions(observations)[source]¶

Parameters: observations – a batch of observations
Returns: get the corresponding batch of actions

get_actions_with_prev(observations, prev_actions, prev_hiddens)[source]¶

Parameters

observations – input batch of observations
prev_actions – previous batch of actions
prev_hiddens – previous hidden layer

Returns

actions for the current batch of observations

get_fim(x, actions)[source]¶

Parameters

x – input observation feature
actions – input actions

Returns

get fisher information matrix

get_kl(x, actions, h=None)[source]¶

Parameters

x – input feature
actions – actions
h – hidden layer

Returns

KL divergence of updated policy and the old one

get_log_prob(x, actions)[source]¶

Parameters

x – input obs feature
actions – input actions

Returns

log likelihood of the actions given the distribution output by the network

load_param(param_path: str)[source]¶

Parameters: param_path – saved parameter file path
Returns: no return, load the parameter into the current model

property observation_space¶

property recurrent¶

reset(dones=None)[source]¶

Parameters: dones – indicators of whether all the agent have finished their episode or not
Returns: no return, update some information according the given list of dones

property state_info_specs¶

property vectorized¶

algorithms.policy.GaussianMLPBaseline module¶

class algorithms.policy.GaussianMLPBaseline.GaussianMLP(input_dim, output_dim, mean_network=None, optimizer=None, hidden_size=(32, 32), step_size=0.01, init_std=1.0, normalize_inputs=True, normalize_outputs=True, subsample_factor=1.0, max_itr=20)[source]¶

Bases: torch.nn.modules.module.Module

fit(xs, ys)[source]¶

Parameters

xs – feature
ys – ground truth y

Returns

no return, fit our model accordingly

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

predict(xs)[source]¶

Parameters: xs – input feature
Returns: predicted y given input feature using the model

class algorithms.policy.GaussianMLPBaseline.GaussianMLPBaseline(env_spec, subsample_factor=1, num_seq_inputs=1, regressor_args=None)[source]¶

Bases: object

Baseline model to reduce variance

fit(paths)[source]¶

Parameters: paths – observations and rewards
Returns: fitting the baseline model

parameters()[source]¶

predict(path)[source]¶

Parameters: path – giving observations
Returns: predict reward given observations

set_cuda()[source]¶

algorithms.policy.MLP module¶

class algorithms.policy.MLP.MLP(input_size, hidden_size, output_size)[source]¶

Bases: torch.nn.modules.module.Module

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

algorithms.policy package¶

Submodules¶

algorithms.policy.GRUCell module¶

algorithms.policy.GRUNetwork module¶

algorithms.policy.GaussianGRUPolicy module¶

algorithms.policy.GaussianMLPBaseline module¶

algorithms.policy.MLP module¶

Module contents¶

AutoEnv

Navigation

Related Topics