algorithms.policy package

Submodules

algorithms.policy.GRUCell module

class algorithms.policy.GRUCell.GRUCell(input_size, hidden_size)[source]

Bases: torch.nn.modules.module.Module

forward(x, h=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

algorithms.policy.GRUNetwork module

class algorithms.policy.GRUNetwork.GRUNetwork(input_dim, output_dim, hidden_dim, gru_layer=<class 'algorithms.policy.GRUCell.GRUCell'>, output_nonlinearity=None)[source]

Bases: torch.nn.modules.module.Module

forward(x, h=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

algorithms.policy.GaussianGRUPolicy module

class algorithms.policy.GaussianGRUPolicy.GaussianGRUPolicy(env_spec, hidden_dim=32, feature_network=None, state_include_action=True, gru_layer=<class 'algorithms.policy.GRUCell.GRUCell'>, output_nonlinearity=None, mode: int = 0, log_std=0, cuda_enable=True)[source]

Bases: torch.nn.modules.module.Module

property action_space
dist_info_sym(obs_var, state_info_vars)[source]
property distribution
forward(x, h=None)[source]
Parameters
  • x – input feature

  • h – hidden layer

Returns

output mean, log std for action and hidden layer for next round

get_action(observation)[source]
Parameters

observation – input observation

Returns

get actions from the given observation

get_actions(observations)[source]
Parameters

observations – a batch of observations

Returns

get the corresponding batch of actions

get_actions_with_prev(observations, prev_actions, prev_hiddens)[source]
Parameters
  • observations – input batch of observations

  • prev_actions – previous batch of actions

  • prev_hiddens – previous hidden layer

Returns

actions for the current batch of observations

get_fim(x, actions)[source]
Parameters
  • x – input observation feature

  • actions – input actions

Returns

get fisher information matrix

get_kl(x, actions, h=None)[source]
Parameters
  • x – input feature

  • actions – actions

  • h – hidden layer

Returns

KL divergence of updated policy and the old one

get_log_prob(x, actions)[source]
Parameters
  • x – input obs feature

  • actions – input actions

Returns

log likelihood of the actions given the distribution output by the network

load_param(param_path: str)[source]
Parameters

param_path – saved parameter file path

Returns

no return, load the parameter into the current model

property observation_space
property recurrent
reset(dones=None)[source]
Parameters

dones – indicators of whether all the agent have finished their episode or not

Returns

no return, update some information according the given list of dones

property state_info_specs
property vectorized

algorithms.policy.GaussianMLPBaseline module

class algorithms.policy.GaussianMLPBaseline.GaussianMLP(input_dim, output_dim, mean_network=None, optimizer=None, hidden_size=(32, 32), step_size=0.01, init_std=1.0, normalize_inputs=True, normalize_outputs=True, subsample_factor=1.0, max_itr=20)[source]

Bases: torch.nn.modules.module.Module

fit(xs, ys)[source]
Parameters
  • xs – feature

  • ys – ground truth y

Returns

no return, fit our model accordingly

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

predict(xs)[source]
Parameters

xs – input feature

Returns

predicted y given input feature using the model

class algorithms.policy.GaussianMLPBaseline.GaussianMLPBaseline(env_spec, subsample_factor=1, num_seq_inputs=1, regressor_args=None)[source]

Bases: object

Baseline model to reduce variance

fit(paths)[source]
Parameters

paths – observations and rewards

Returns

fitting the baseline model

parameters()[source]
predict(path)[source]
Parameters

path – giving observations

Returns

predict reward given observations

set_cuda()[source]

algorithms.policy.MLP module

class algorithms.policy.MLP.MLP(input_size, hidden_size, output_size)[source]

Bases: torch.nn.modules.module.Module

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Module contents