algorithms.AGen package¶
Subpackages¶
Submodules¶
algorithms.AGen.my_gaussian_gru_policy module¶
-
class
algorithms.AGen.my_gaussian_gru_policy.
myGaussianGRUPolicy
(name, env_spec, hidden_dim=32, feature_network=None, state_include_action=True, hidden_nonlinearity=<function tanh>, gru_layer_cls=<class 'sandbox.rocky.tf.core.layers.GRULayer'>, learn_std=True, init_std=1.0, output_nonlinearity=None)[source]¶ Bases:
sandbox.rocky.tf.policies.base.StochasticPolicy
,sandbox.rocky.tf.core.layers_powered.LayersPowered
,rllab.core.serializable.Serializable
-
dist_info_sym
(obs_var, state_info_vars)[source]¶ Return the symbolic distribution information about the actions. :param obs_var: symbolic variable for observations :param state_info_vars: a dictionary whose values should contain information about the state of the policy at the time it received the observation :return:
-
property
distribution
¶ :rtype Distribution
-
property
recurrent
¶ Indicates whether the policy is recurrent. :return:
-
property
state_info_specs
¶ Return keys and shapes for the information related to the policy’s state when taking an action. :return:
-
property
vectorized
¶ Indicates whether the policy is vectorized. If True, it should implement get_actions(), and support resetting with multiple simultaneous states.
-
algorithms.AGen.rls module¶
algorithms.AGen.validate_utils module¶
-
algorithms.AGen.validate_utils.
build_ngsim_env
(args, exp_dir='/tmp', alpha=0.001, vectorize=False, render_params=None, videoMaking=False)[source]¶
-
algorithms.AGen.validate_utils.
get_ground_truth
(ngsim_filename: str, h5_filename: str)[source]¶ Namespace(batch_size=10000, critic_batch_size=1000, critic_dropout_keep_prob=0.8, critic_grad_rescale=40.0, critic_hidden_layer_dims=(128, 128, 64), critic_learning_rate=0.0004, decay_reward=False, discount=0.95, do_curriculum=False, env_H=200, env_action_repeat=1, env_multiagent=False, env_primesteps=50, env_reward=0, exp_dir=’../../data/experiments’, exp_name=’singleagent_def_3’, expert_filepath=’../../data/trajectories/ngsim.h5’, gradient_penalty=2.0, itrs_per_decay=25, latent_dim=4, load_params_init=’NONE’, max_path_length=1000, n_critic_train_epochs=40, n_envs=1, n_envs_end=50, n_envs_start=10, n_envs_step=10, n_itr=1000, n_recognition_train_epochs=30, ngsim_filename=’trajdata_i101_trajectories-0750am-0805am.txt’, normalize_clip_std_multiple=10.0, params_filepath=’‘, policy_mean_hidden_layer_dims=(128, 128, 64), policy_recurrent=True, policy_std_hidden_layer_dims=(128, 64), recognition_hidden_layer_dims=(128, 64), recognition_learning_rate=0.0005, recurrent_hidden_dim=64, remove_ngsim_veh=False, render_every=25, reward_handler_critic_final_scale=1.0, reward_handler_max_epochs=100, reward_handler_recognition_final_scale=0.2, reward_handler_use_env_rewards=True, scheduler_k=20, trpo_step_size=0.01, use_critic_replay_memory=True, use_infogail=False, validator_render=False, vectorize=True)
-
algorithms.AGen.validate_utils.
get_multiagent_ground_truth
(ngsim_filename: str, h5_filename: str)[source]¶ Namespace(batch_size=10000, critic_batch_size=1000, critic_dropout_keep_prob=0.8, critic_grad_rescale=40.0, critic_hidden_layer_dims=(128, 128, 64), critic_learning_rate=0.0004, decay_reward=False, discount=0.95, do_curriculum=False, env_H=200, env_action_repeat=1, env_multiagent=False, env_primesteps=50, env_reward=0, exp_dir=’../../data/experiments’, exp_name=’singleagent_def_3’, expert_filepath=’../../data/trajectories/ngsim.h5’, gradient_penalty=2.0, itrs_per_decay=25, latent_dim=4, load_params_init=’NONE’, max_path_length=1000, n_critic_train_epochs=40, n_envs=1, n_envs_end=50, n_envs_start=10, n_envs_step=10, n_itr=1000, n_recognition_train_epochs=30, ngsim_filename=’trajdata_i101_trajectories-0750am-0805am.txt’, normalize_clip_std_multiple=10.0, params_filepath=’‘, policy_mean_hidden_layer_dims=(128, 128, 64), policy_recurrent=True, policy_std_hidden_layer_dims=(128, 64), recognition_hidden_layer_dims=(128, 64), recognition_learning_rate=0.0005, recurrent_hidden_dim=64, remove_ngsim_veh=False, render_every=25, reward_handler_critic_final_scale=1.0, reward_handler_max_epochs=100, reward_handler_recognition_final_scale=0.2, reward_handler_use_env_rewards=True, scheduler_k=20, trpo_step_size=0.01, use_critic_replay_memory=True, use_infogail=False, validator_render=False, vectorize=True)