algorithms.RL_Algorithm.optimizers package¶
Subpackages¶
- algorithms.RL_Algorithm.optimizers.utils package
- Submodules
- algorithms.RL_Algorithm.optimizers.utils.math module
- algorithms.RL_Algorithm.optimizers.utils.replay_memory module
- algorithms.RL_Algorithm.optimizers.utils.tools module
- algorithms.RL_Algorithm.optimizers.utils.torch module
- algorithms.RL_Algorithm.optimizers.utils.zfilter module
- Module contents
Submodules¶
algorithms.RL_Algorithm.optimizers.trpo module¶
-
algorithms.RL_Algorithm.optimizers.trpo.
conjugate_gradients
(Avp_f, b, nsteps, rdotr_tol=1e-10)[source]¶ - Parameters
Avp_f – Hessian Vector function
b – negative loss gradient
nsteps – how many steps to search
rdotr_tol – the minimum improvement of rdotr
- Returns
-
algorithms.RL_Algorithm.optimizers.trpo.
line_search
(model, f, x, fullstep, expected_improve_full, max_backtracks=10, accept_ratio=0.1)[source]¶ - Parameters
model – our policy model
f – evaluation function
x – params of the model
fullstep – full step size
expected_improve_full – expected improve
max_backtracks – max iterative steps .5^n
accept_ratio – accepted improving rate
- Returns
a boolean var indicating if the update step is success, if true return new param
-
algorithms.RL_Algorithm.optimizers.trpo.
ones
(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶ Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument
sizes
.- Args:
- sizes (int…): a sequence of integers defining the shape of the output tensor.
Can be a variable number of arguments or a collection like a list or tuple.
out (Tensor, optional): the output tensor dtype (
torch.dtype
, optional): the desired data type of returned tensor.Default: if
None
, uses a global default (seetorch.set_default_tensor_type()
).- layout (
torch.layout
, optional): the desired layout of returned Tensor. Default:
torch.strided
.- device (
torch.device
, optional): the desired device of returned tensor. Default: if
None
, uses the current device for the default tensor type (seetorch.set_default_tensor_type()
).device
will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.- requires_grad (bool, optional): If autograd should record operations on the
returned tensor. Default:
False
.
Example:
>>> torch.ones(2, 3) tensor([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> torch.ones(5) tensor([ 1., 1., 1., 1., 1.])
-
algorithms.RL_Algorithm.optimizers.trpo.
tensor
(data, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor¶ Constructs a tensor with
data
.Warning
torch.tensor()
always copiesdata
. If you have a Tensordata
and want to avoid a copy, usetorch.Tensor.requires_grad_()
ortorch.Tensor.detach()
. If you have a NumPyndarray
and want to avoid a copy, usetorch.as_tensor()
.Warning
When data is a tensor x,
torch.tensor()
reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Thereforetorch.tensor(x)
is equivalent tox.clone().detach()
andtorch.tensor(x, requires_grad=True)
is equivalent tox.clone().detach().requires_grad_(True)
. The equivalents usingclone()
anddetach()
are recommended.- Args:
- data (array_like): Initial data for the tensor. Can be a list, tuple,
NumPy
ndarray
, scalar, and other types.- dtype (
torch.dtype
, optional): the desired data type of returned tensor. Default: if
None
, infers data type fromdata
.- device (
torch.device
, optional): the desired device of returned tensor. Default: if
None
, uses the current device for the default tensor type (seetorch.set_default_tensor_type()
).device
will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.- requires_grad (bool, optional): If autograd should record operations on the
returned tensor. Default:
False
.- pin_memory (bool, optional): If set, returned tensor would be allocated in
the pinned memory. Works only for CPU tensors. Default:
False
.
Example:
>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]]) tensor([[ 0.1000, 1.2000], [ 2.2000, 3.1000], [ 4.9000, 5.2000]]) >>> torch.tensor([0, 1]) # Type inference on data tensor([ 0, 1]) >>> torch.tensor([[0.11111, 0.222222, 0.3333333]], dtype=torch.float64, device=torch.device('cuda:0')) # creates a torch.cuda.DoubleTensor tensor([[ 0.1111, 0.2222, 0.3333]], dtype=torch.float64, device='cuda:0') >>> torch.tensor(3.14159) # Create a scalar (zero-dimensional tensor) tensor(3.1416) >>> torch.tensor([]) # Create an empty tensor (of size (0,)) tensor([])
-
algorithms.RL_Algorithm.optimizers.trpo.
trpo_step
(policy_net, states, actions, advantages, max_kl, damping, use_fim=True)[source]¶ optimize param of policy net given states and actions and advantages using TRPO
-
algorithms.RL_Algorithm.optimizers.trpo.
zeros
(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶ Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument
sizes
.- Args:
- sizes (int…): a sequence of integers defining the shape of the output tensor.
Can be a variable number of arguments or a collection like a list or tuple.
out (Tensor, optional): the output tensor dtype (
torch.dtype
, optional): the desired data type of returned tensor.Default: if
None
, uses a global default (seetorch.set_default_tensor_type()
).- layout (
torch.layout
, optional): the desired layout of returned Tensor. Default:
torch.strided
.- device (
torch.device
, optional): the desired device of returned tensor. Default: if
None
, uses the current device for the default tensor type (seetorch.set_default_tensor_type()
).device
will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.- requires_grad (bool, optional): If autograd should record operations on the
returned tensor. Default:
False
.
Example:
>>> torch.zeros(2, 3) tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> torch.zeros(5) tensor([ 0., 0., 0., 0., 0.])