algorithms.RL_Algorithm.optimizers package

Submodules

algorithms.RL_Algorithm.optimizers.trpo module

algorithms.RL_Algorithm.optimizers.trpo.conjugate_gradients(Avp_f, b, nsteps, rdotr_tol=1e-10)[source]
Parameters
  • Avp_f – Hessian Vector function

  • b – negative loss gradient

  • nsteps – how many steps to search

  • rdotr_tol – the minimum improvement of rdotr

Returns

Parameters
  • model – our policy model

  • f – evaluation function

  • x – params of the model

  • fullstep – full step size

  • expected_improve_full – expected improve

  • max_backtracks – max iterative steps .5^n

  • accept_ratio – accepted improving rate

Returns

a boolean var indicating if the update step is success, if true return new param

algorithms.RL_Algorithm.optimizers.trpo.ones(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument sizes.

Args:
sizes (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.ones(2, 3)
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

>>> torch.ones(5)
tensor([ 1.,  1.,  1.,  1.,  1.])
algorithms.RL_Algorithm.optimizers.trpo.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor

Constructs a tensor with data.

Warning

torch.tensor() always copies data. If you have a Tensor data and want to avoid a copy, use torch.Tensor.requires_grad_() or torch.Tensor.detach(). If you have a NumPy ndarray and want to avoid a copy, use torch.as_tensor().

Warning

When data is a tensor x, torch.tensor() reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore torch.tensor(x) is equivalent to x.clone().detach() and torch.tensor(x, requires_grad=True) is equivalent to x.clone().detach().requires_grad_(True). The equivalents using clone() and detach() are recommended.

Args:
data (array_like): Initial data for the tensor. Can be a list, tuple,

NumPy ndarray, scalar, and other types.

dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, infers data type from data.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

pin_memory (bool, optional): If set, returned tensor would be allocated in

the pinned memory. Works only for CPU tensors. Default: False.

Example:

>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
tensor([[ 0.1000,  1.2000],
        [ 2.2000,  3.1000],
        [ 4.9000,  5.2000]])

>>> torch.tensor([0, 1])  # Type inference on data
tensor([ 0,  1])

>>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
                 dtype=torch.float64,
                 device=torch.device('cuda:0'))  # creates a torch.cuda.DoubleTensor
tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')

>>> torch.tensor(3.14159)  # Create a scalar (zero-dimensional tensor)
tensor(3.1416)

>>> torch.tensor([])  # Create an empty tensor (of size (0,))
tensor([])
algorithms.RL_Algorithm.optimizers.trpo.trpo_step(policy_net, states, actions, advantages, max_kl, damping, use_fim=True)[source]

optimize param of policy net given states and actions and advantages using TRPO

algorithms.RL_Algorithm.optimizers.trpo.zeros(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument sizes.

Args:
sizes (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.zeros(2, 3)
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

>>> torch.zeros(5)
tensor([ 0.,  0.,  0.,  0.,  0.])

Module contents