algorithms.RL_Algorithm.optimizers package¶

Subpackages¶

algorithms.RL_Algorithm.optimizers.utils package

Submodules¶

algorithms.RL_Algorithm.optimizers.trpo module¶

algorithms.RL_Algorithm.optimizers.trpo.conjugate_gradients(Avp_f, b, nsteps, rdotr_tol=1e-10)[source]¶

Parameters

Avp_f – Hessian Vector function
b – negative loss gradient
nsteps – how many steps to search
rdotr_tol – the minimum improvement of rdotr

Returns

algorithms.RL_Algorithm.optimizers.trpo.line_search(model, f, x, fullstep, expected_improve_full, max_backtracks=10, accept_ratio=0.1)[source]¶

Parameters

model – our policy model
f – evaluation function
x – params of the model
fullstep – full step size
expected_improve_full – expected improve
max_backtracks – max iterative steps .5^n
accept_ratio – accepted improving rate

Returns

a boolean var indicating if the update step is success, if true return new param

algorithms.RL_Algorithm.optimizers.trpo.ones(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶

Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument sizes.

Args:

sizes (int…): a sequence of integers defining the shape of the output tensor.: Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.: Default: torch.strided.
device (torch.device, optional): the desired device of returned tensor.: Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
requires_grad (bool, optional): If autograd should record operations on the: returned tensor. Default: False.

Example:

>>> torch.ones(2, 3)
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

>>> torch.ones(5)
tensor([ 1.,  1.,  1.,  1.,  1.])

algorithms.RL_Algorithm.optimizers.trpo.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor¶

Constructs a tensor with data.

Warning

torch.tensor() always copies data. If you have a Tensor data and want to avoid a copy, use torch.Tensor.requires_grad_() or torch.Tensor.detach(). If you have a NumPy ndarray and want to avoid a copy, use torch.as_tensor().

Warning

When data is a tensor x, torch.tensor() reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore torch.tensor(x) is equivalent to x.clone().detach() and torch.tensor(x, requires_grad=True) is equivalent to x.clone().detach().requires_grad_(True). The equivalents using clone() and detach() are recommended.

Args:

data (array_like): Initial data for the tensor. Can be a list, tuple,: NumPy ndarray, scalar, and other types.
dtype (torch.dtype, optional): the desired data type of returned tensor.: Default: if None, infers data type from data.
device (torch.device, optional): the desired device of returned tensor.: Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
requires_grad (bool, optional): If autograd should record operations on the: returned tensor. Default: False.
pin_memory (bool, optional): If set, returned tensor would be allocated in: the pinned memory. Works only for CPU tensors. Default: False.

Example:

>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
tensor([[ 0.1000,  1.2000],
        [ 2.2000,  3.1000],
        [ 4.9000,  5.2000]])

>>> torch.tensor([0, 1])  # Type inference on data
tensor([ 0,  1])

>>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
                 dtype=torch.float64,
                 device=torch.device('cuda:0'))  # creates a torch.cuda.DoubleTensor
tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')

>>> torch.tensor(3.14159)  # Create a scalar (zero-dimensional tensor)
tensor(3.1416)

>>> torch.tensor([])  # Create an empty tensor (of size (0,))
tensor([])

algorithms.RL_Algorithm.optimizers.trpo.trpo_step(policy_net, states, actions, advantages, max_kl, damping, use_fim=True)[source]¶: optimize param of policy net given states and actions and advantages using TRPO

algorithms.RL_Algorithm.optimizers.trpo.zeros(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶

Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument sizes.

Args:

sizes (int…): a sequence of integers defining the shape of the output tensor.: Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.: Default: torch.strided.
device (torch.device, optional): the desired device of returned tensor.: Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
requires_grad (bool, optional): If autograd should record operations on the: returned tensor. Default: False.

Example:

>>> torch.zeros(2, 3)
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

>>> torch.zeros(5)
tensor([ 0.,  0.,  0.,  0.,  0.])

algorithms.RL_Algorithm.optimizers package¶

Subpackages¶

Submodules¶

algorithms.RL_Algorithm.optimizers.trpo module¶

Module contents¶

AutoEnv

Navigation

Related Topics