algorithms.RL_Algorithm.GAIL package

Submodules

algorithms.RL_Algorithm.GAIL.gail module

class algorithms.RL_Algorithm.GAIL.gail.GAIL(env, policy, baseline, critic=None, recognition=None, step_size=0.01, reward_handler=<algorithms.RL_Algorithm.utils.RewardHandler object>, saver=None, saver_filepath=None, validator=None, snapshot_env=True, scope=None, n_itr=500, start_itr=0, batch_size=5000, max_path_length=500, discount=0.99, gae_lambda=1, plot=False, pause_for_plot=False, center_adv=True, positive_adv=False, store_paths=False, whole_paths=True, fixed_horizon=False, sampler_cls=None, sampler_args=None, force_batch_sampler=False, max_kl=None, damping=None, l2_reg=None, policy_filepath=None, critic_filepath=None, env_filepath=None, cuda_enable=True, args=None)[source]

Bases: object

get_itr_snapshot(itr, samples_data)[source]

Snapshot critic and recognition model as well

init_env(itr)[source]
load()[source]

Load parameters from a filepath. Symmetric to _save. This is not ideal, but it’s easier than keeping track of everything separately.

load_critic(critic_param_path)[source]
load_policy(policy_param_path)[source]
log_diagnostics(paths)[source]
obtain_samples(itr)[source]
optimize_policy(itr, samples_data)[source]

Update the critic and recognition model in addition to the policy

Args:

itr: iteration counter samples_data: dictionary resulting from process_samples

keys: ‘rewards’, ‘observations’, ‘agent_infos’, ‘env_infos’, ‘returns’,

‘actions’, ‘advantages’, ‘paths’

the values in the infos dicts can be accessed for example as:

samples_data[‘agent_infos’][‘prob’]

and the returned value will be an array of shape (batch_size, prob_dim)

process_samples(itr, paths)[source]

Augment path rewards with critic and recognition model rewards

Args:

itr: iteration counter paths: list of dictionaries

each containing info for a single trajectory each with keys ‘observations’, ‘actions’, ‘agent_infos’, ‘env_infos’, ‘rewards’

shutdown_worker()[source]
start_worker()[source]
train()[source]
algorithms.RL_Algorithm.GAIL.gail.ones(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument sizes.

Args:
sizes (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.ones(2, 3)
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

>>> torch.ones(5)
tensor([ 1.,  1.,  1.,  1.,  1.])
algorithms.RL_Algorithm.GAIL.gail.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor

Constructs a tensor with data.

Warning

torch.tensor() always copies data. If you have a Tensor data and want to avoid a copy, use torch.Tensor.requires_grad_() or torch.Tensor.detach(). If you have a NumPy ndarray and want to avoid a copy, use torch.as_tensor().

Warning

When data is a tensor x, torch.tensor() reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore torch.tensor(x) is equivalent to x.clone().detach() and torch.tensor(x, requires_grad=True) is equivalent to x.clone().detach().requires_grad_(True). The equivalents using clone() and detach() are recommended.

Args:
data (array_like): Initial data for the tensor. Can be a list, tuple,

NumPy ndarray, scalar, and other types.

dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, infers data type from data.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

pin_memory (bool, optional): If set, returned tensor would be allocated in

the pinned memory. Works only for CPU tensors. Default: False.

Example:

>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
tensor([[ 0.1000,  1.2000],
        [ 2.2000,  3.1000],
        [ 4.9000,  5.2000]])

>>> torch.tensor([0, 1])  # Type inference on data
tensor([ 0,  1])

>>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
                 dtype=torch.float64,
                 device=torch.device('cuda:0'))  # creates a torch.cuda.DoubleTensor
tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')

>>> torch.tensor(3.14159)  # Create a scalar (zero-dimensional tensor)
tensor(3.1416)

>>> torch.tensor([])  # Create an empty tensor (of size (0,))
tensor([])
algorithms.RL_Algorithm.GAIL.gail.zeros(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument sizes.

Args:
sizes (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.zeros(2, 3)
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

>>> torch.zeros(5)
tensor([ 0.,  0.,  0.,  0.,  0.])

Module contents