tf2rl.experiments package
Submodules
tf2rl.experiments.irl_trainer module
- class tf2rl.experiments.irl_trainer.IRLTrainer(policy, env, args, irl, expert_obs, expert_next_obs, expert_act, test_env=None)
Bases:
tf2rl.experiments.trainer.Trainer
Trainer class for inverse reinforce learning
Command Line Args:
--max-steps
(int): The maximum steps for training. The default isint(1e6)
--episode-max-steps
(int): The maximum steps for an episode. The default isint(1e3)
--n-experiments
(int): Number of experiments. The default is1
--show-progress
: Callrender
function during training--save-model-interval
(int): Interval to save model. The default isint(1e4)
--save-summary-interval
(int): Interval to save summary. The default isint(1e3)
--model-dir
(str): Directory to restore model.--dir-suffix
(str): Suffix for directory that stores results.--normalize-obs
: Whether normalize observation--logdir
(str): Output directory name. The default is"results"
--evaluate
: Whether evaluate trained model--test-interval
(int): Interval to evaluate trained model. The default isint(1e4)
--show-test-progress
: Callrender
function during evaluation.--test-episodes
(int): Number of episodes at test. The default is5
--save-test-path
: Save trajectories of evaluation.--show-test-images
: Show input images to neural networks when an episode finishes--save-test-movie
: Save rendering results.--use-prioritized-rb
: Use prioritized experience replay--use-nstep-rb
: Use Nstep experience replay--n-step
(int): Number of steps for nstep experience reward. The default is4
--logging-level
(DEBUG, INFO, WARNING): Choose logging level. The default isINFO
--expert-path-dir
(str): Path to directory that contains expert trajectories
- __init__(policy, env, args, irl, expert_obs, expert_next_obs, expert_act, test_env=None)
Initialize Trainer class
- Parameters
policy – Policy to be trained
env (gym.Env) – Environment for train
args (Namespace or dict) – config parameters specified with command line
irl –
expert_obs –
expert_next_obs –
expert_act –
test_env (gym.Env) – Environment for test.
- static get_argument(parser=None)
Create or update argument parser for command line program
- Parameters
parser (argparse.ArgParser, optional) – argument parser
- Returns
argument parser
- Return type
argparse.ArgParser
tf2rl.experiments.me_trpo_trainer module
- class tf2rl.experiments.me_trpo_trainer.MeTrpoTrainer(*args, n_eval_episodes_per_model=5, **kwargs)
Bases:
tf2rl.experiments.mpc_trainer.MPCTrainer
Trainer class for Model-Ensemble Trust-Region Policy Optimization (ME-TRPO):https://arxiv.org/abs/1802.10592
Command Line Args:
--max-steps
(int): The maximum steps for training. The default isint(1e6)
--episode-max-steps
(int): The maximum steps for an episode. The default isint(1e3)
--n-experiments
(int): Number of experiments. The default is1
--show-progress
: Callrender
function during training--save-model-interval
(int): Interval to save model. The default isint(1e4)
--save-summary-interval
(int): Interval to save summary. The default isint(1e3)
--model-dir
(str): Directory to restore model.--dir-suffix
(str): Suffix for directory that stores results.--normalize-obs
: Whether normalize observation--logdir
(str): Output directory name. The default is"results"
--evaluate
: Whether evaluate trained model--test-interval
(int): Interval to evaluate trained model. The default isint(1e4)
--show-test-progress
: Callrender
function during evaluation.--test-episodes
(int): Number of episodes at test. The default is5
--save-test-path
: Save trajectories of evaluation.--show-test-images
: Show input images to neural networks when an episode finishes--save-test-movie
: Save rendering results.--use-prioritized-rb
: Use prioritized experience replay--use-nstep-rb
: Use Nstep experience replay--n-step
(int): Number of steps for nstep experience reward. The default is4
--logging-level
(DEBUG, INFO, WARNING): Choose logging level. The default isINFO
--gpu
(int): The default is0
--max-iter
(int): Maximum iteration. The default is100
--horizon
(int): Number of steps to online horizon--n-sample
(int): Number of samples. The default is1000
--batch-size
(int): Batch size. The default is512
.--n-collect-steps
(int): Number of steps to collect. The default is100
--debug
: Enable debug
- __init__(*args, n_eval_episodes_per_model=5, **kwargs)
Initialize ME-TRPO
- Parameters
policy – Policy to be trained
env (gym.Env) – Environment for train
args (Namespace or dict) – config parameters specified with command line
test_env (gym.Env) – Environment for test.
reward_fn (callable) – Reward function
buffer_size (int) – The default is
int(1e6)
lr (float) – Learning rate for dynamics model. The default is
0.001
.n_eval_episode_per_model (int) – Number of evalation episodes per a model. The default is
5
- predict_next_state(obses, acts, idx=None)
Predict Next State
- Parameters
obses –
acts –
idx (int) – Index number of dynamics mode to use. If
None
(default), choose randomly.
- Returns
next state
- Return type
np.ndarray
- update_policy()
Update Policy
- collect_transitions_real_env()
Collect Trandisions from Real Environment
- collect_transitions_sim_env()
Generate transitions using dynamics model
- finish_horizon(last_val=0)
TODO: These codes are completly identical to the ones defined in on_policy_trainer.py. Use it.
- evaluate_policy(total_steps)
- static get_argument(parser=None)
Create or update argument parser for command line program
- Parameters
parser (argparse.ArgParser, optional) – argument parser
- Returns
argument parser
- Return type
argparse.ArgParser
tf2rl.experiments.mpc_trainer module
- class tf2rl.experiments.mpc_trainer.DynamicsModel(*args, **kwargs)
Bases:
tensorflow.python.keras.engine.training.Model
- __init__(input_dim, output_dim, units=[32, 32], name='DymamicsModel', gpu=0)
Initialize DynamicsModel
- Parameters
input_dim (int) –
output_dim (int) –
units (iterable of int) – The default is
[32, 32]
name (str) – The default is
"DynamicsModel"
gpu (int) – The default is
0
.
- call(inputs)
Call Dynamics Model
- Parameters
inputs (tf.Tensor) –
- Returns
tf.Tensor
- predict(inputs)
Generates output predictions for the input samples.
Computation is done in batches. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using __call__ is recommended for faster execution, e.g., model(x), or model(x, training=False) if you have layers such as tf.keras.layers.BatchNormalization that behaves differently during inference. Also, note the fact that test loss is not affected by regularization layers like noise and dropout.
- Parameters
x –
Input samples. It could be: - A Numpy array (or array-like), or a list of arrays
(in case the model has multiple inputs).
A TensorFlow tensor, or a list of tensors (in case the model has multiple inputs).
A tf.data dataset.
A generator or keras.utils.Sequence instance.
A more detailed description of unpacking behavior for iterator types (Dataset, generator, Sequence) is given in the Unpacking behavior for iterator-like inputs section of Model.fit.
batch_size – Integer or None. Number of samples per batch. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of dataset, generators, or keras.utils.Sequence instances (since they generate batches).
verbose – Verbosity mode, 0 or 1.
steps – Total number of steps (batches of samples) before declaring the prediction round finished. Ignored with the default value of None. If x is a tf.data dataset and steps is None, predict will run until the input dataset is exhausted.
callbacks – List of keras.callbacks.Callback instances. List of callbacks to apply during prediction. See [callbacks](/api_docs/python/tf/keras/callbacks).
max_queue_size – Integer. Used for generator or keras.utils.Sequence input only. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
workers – Integer. Used for generator or keras.utils.Sequence input only. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1. If 0, will execute the generator on the main thread.
use_multiprocessing – Boolean. Used for generator or keras.utils.Sequence input only. If True, use process-based threading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-picklable arguments to the generator as they can’t be passed easily to children processes.
See the discussion of Unpacking behavior for iterator-like inputs for Model.fit. Note that Model.predict uses the same interpretation rules as Model.fit and Model.evaluate, so inputs must be unambiguous for all three methods.
- Returns
Numpy array(s) of predictions.
- Raises
RuntimeError – If model.predict is wrapped in tf.function.
ValueError – In case of mismatch between the provided input data and the model’s expectations, or in case a stateful model receives a number of samples that is not a multiple of the batch size.
- class tf2rl.experiments.mpc_trainer.RandomPolicy(max_action, act_dim)
Bases:
object
- __init__(max_action, act_dim)
Initialize RandomPolicy
- Parameters
max_action (float) –
act_dim (int) –
- get_action(obs)
Get random action
- Parameters
obs –
- Returns
action
- Return type
float
- get_actions(obses)
Get batch actions
- Parameters
obses –
- Returns
batch actions
- Return type
np.dnarray
- class tf2rl.experiments.mpc_trainer.MPCTrainer(policy, env, args, reward_fn, buffer_size=1000000, n_dynamics_model=1, lr=0.001, **kwargs)
Bases:
tf2rl.experiments.trainer.Trainer
Trainer class for Model Predictive Control (MPC): https://arxiv.org/abs/1708.02596
Command Line Args:
--max-steps
(int): The maximum steps for training. The default isint(1e6)
--episode-max-steps
(int): The maximum steps for an episode. The default isint(1e3)
--n-experiments
(int): Number of experiments. The default is1
--show-progress
: Callrender
function during training--save-model-interval
(int): Interval to save model. The default isint(1e4)
--save-summary-interval
(int): Interval to save summary. The default isint(1e3)
--model-dir
(str): Directory to restore model.--dir-suffix
(str): Suffix for directory that stores results.--normalize-obs
: Whether normalize observation--logdir
(str): Output directory name. The default is"results"
--evaluate
: Whether evaluate trained model--test-interval
(int): Interval to evaluate trained model. The default isint(1e4)
--show-test-progress
: Callrender
function during evaluation.--test-episodes
(int): Number of episodes at test. The default is5
--save-test-path
: Save trajectories of evaluation.--show-test-images
: Show input images to neural networks when an episode finishes--save-test-movie
: Save rendering results.--use-prioritized-rb
: Use prioritized experience replay--use-nstep-rb
: Use Nstep experience replay--n-step
(int): Number of steps for nstep experience reward. The default is4
--logging-level
(DEBUG, INFO, WARNING): Choose logging level. The default isINFO
--gpu
(int): The default is0
--max-iter
(int): Maximum iteration. The default is100
--horizon
(int): Number of steps to online horizon--n-sample
(int): Number of samples. The default is1000
--batch-size
(int): Batch size. The default is512
.
- __init__(policy, env, args, reward_fn, buffer_size=1000000, n_dynamics_model=1, lr=0.001, **kwargs)
Initialize MPCTrainer class
- Parameters
policy – Policy to be trained
env (gym.Env) – Environment for train
args (Namespace or dict) – config parameters specified with command line
test_env (gym.Env) – Environment for test.
reward_fn (callable) – Reward function
buffer_size (int) – The default is
int(1e6)
n_dynamics_model (int) – Number of dynamics models. The default is
1
.lr (float) – Learning rate for dynamics model. The default is
0.001
.
- predict_next_state(obses, acts)
Predict Next State
- Parameters
obses –
acts –
- Returns
next state
- Return type
np.ndarray
- collect_episodes(n_rollout=1)
Collect Episodes
- Parameters
n_rollout (int) – Number of rollout. The default is
1
- fit_dynamics(n_epoch=1)
Fit dynamics
- Parameters
n_epocs (int) – Number of epocs to fit
- static get_argument(parser=None)
Create or update argument parser for command line program
- Parameters
parser (argparse.ArgParser, optional) – argument parser
- Returns
argument parser
- Return type
argparse.ArgParser
tf2rl.experiments.on_policy_trainer module
- class tf2rl.experiments.on_policy_trainer.OnPolicyTrainer(*args, **kwargs)
Bases:
tf2rl.experiments.trainer.Trainer
Trainer class for on-policy reinforcement learning
Command Line Args:
--max-steps
(int): The maximum steps for training. The default isint(1e6)
--episode-max-steps
(int): The maximum steps for an episode. The default isint(1e3)
--n-experiments
(int): Number of experiments. The default is1
--show-progress
: Callrender
function during training--save-model-interval
(int): Interval to save model. The default isint(1e4)
--save-summary-interval
(int): Interval to save summary. The default isint(1e3)
--model-dir
(str): Directory to restore model.--dir-suffix
(str): Suffix for directory that stores results.--normalize-obs
: Whether normalize observation--logdir
(str): Output directory name. The default is"results"
--evaluate
: Whether evaluate trained model--test-interval
(int): Interval to evaluate trained model. The default isint(1e4)
--show-test-progress
: Callrender
function during evaluation.--test-episodes
(int): Number of episodes at test. The default is5
--save-test-path
: Save trajectories of evaluation.--show-test-images
: Show input images to neural networks when an episode finishes--save-test-movie
: Save rendering results.--use-prioritized-rb
: Use prioritized experience replay--use-nstep-rb
: Use Nstep experience replay--n-step
(int): Number of steps for nstep experience reward. The default is4
--logging-level
(DEBUG, INFO, WARNING): Choose logging level. The default isINFO
- __init__(*args, **kwargs)
Initialize On-Policy Trainer
- Parameters
policy – Policy to be trained
env (gym.Env) – Environment for train
args (Namespace or dict) – config parameters specified with command line
test_env (gym.Env) – Environment for test.
- finish_horizon(last_val=0)
Finish horizon
- evaluate_policy(total_steps)
Evaluate policy
- Parameters
total_steps (int) – Current total steps of training
tf2rl.experiments.trainer module
- class tf2rl.experiments.trainer.Trainer(policy, env, args, test_env=None)
Bases:
object
Trainer class for off-policy reinforce learning
Command Line Args:
--max-steps
(int): The maximum steps for training. The default isint(1e6)
--episode-max-steps
(int): The maximum steps for an episode. The default isint(1e3)
--n-experiments
(int): Number of experiments. The default is1
--show-progress
: Callrender
function during training--save-model-interval
(int): Interval to save model. The default isint(1e4)
--save-summary-interval
(int): Interval to save summary. The default isint(1e3)
--model-dir
(str): Directory to restore model.--dir-suffix
(str): Suffix for directory that stores results.--normalize-obs
: Whether normalize observation--logdir
(str): Output directory name. The default is"results"
--evaluate
: Whether evaluate trained model--test-interval
(int): Interval to evaluate trained model. The default isint(1e4)
--show-test-progress
: Callrender
function during evaluation.--test-episodes
(int): Number of episodes at test. The default is5
--save-test-path
: Save trajectories of evaluation.--show-test-images
: Show input images to neural networks when an episode finishes--save-test-movie
: Save rendering results.--use-prioritized-rb
: Use prioritized experience replay--use-nstep-rb
: Use Nstep experience replay--n-step
(int): Number of steps for nstep experience reward. The default is4
--logging-level
(DEBUG, INFO, WARNING): Choose logging level. The default isINFO
- __init__(policy, env, args, test_env=None)
Initialize Trainer class
- Parameters
policy – Policy to be trained
env (gym.Env) – Environment for train
args (Namespace or dict) – config parameters specified with command line
test_env (gym.Env) – Environment for test.
- evaluate_policy_continuously()
Periodically search the latest checkpoint, and keep evaluating with the latest model until user kills process.
- evaluate_policy(total_steps)
- static get_argument(parser=None)
Create or update argument parser for command line program
- Parameters
parser (argparse.ArgParser, optional) – argument parser
- Returns
argument parser
- Return type
argparse.ArgParser
tf2rl.experiments.utils module
- tf2rl.experiments.utils.save_path(samples, filename)
- tf2rl.experiments.utils.restore_latest_n_traj(dirname, n_path=10, max_steps=None)
- tf2rl.experiments.utils.get_filenames(dirname, n_path=None)
- tf2rl.experiments.utils.load_trajectories(filenames, max_steps=None)
- tf2rl.experiments.utils.frames_to_gif(frames, prefix, save_dir, interval=50, fps=30)
Convert frames to gif file