tf2rl.misc package
Submodules
tf2rl.misc.discount_cumsum module
- tf2rl.misc.discount_cumsum.discount_cumsum(x, discount)
Forked from rllab for computing discounted cumulative sums of vectors.
- Parameters
x – np.ndarray or tf.Tensor Vector of inputs
discount – float Discount factor
- Returns
[x0 + discount * x1 + discount^2 * x2, x1 + discount * x2, x2]
- Return type
Discounted cumulative summation. If input is [x0, x1, x2], then the output is
tf2rl.misc.get_replay_buffer module
- tf2rl.misc.get_replay_buffer.get_space_size(space)
- tf2rl.misc.get_replay_buffer.get_default_rb_dict(size, env)
- tf2rl.misc.get_replay_buffer.get_replay_buffer(policy, env, use_prioritized_rb=False, use_nstep_rb=False, n_step=1, size=None)
tf2rl.misc.huber_loss module
- tf2rl.misc.huber_loss.huber_loss(x, delta=1.0)
- Parameters
x – np.ndarray or tf.Tensor Values to compute the huber loss.
delta – float Positive floating point value. Represents the maximum possible gradient magnitude.
- Returns: tf.Tensor
The huber loss.
tf2rl.misc.initialize_logger module
- tf2rl.misc.initialize_logger.initialize_logger(logging_level=20, output_dir='results/', filename=None, save_log=True)
tf2rl.misc.normalizer module
- class tf2rl.misc.normalizer.Normalizer(mean_only=False)
Bases:
object
Normalize input data online. This is based on following: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
- __init__(mean_only=False)
- observe(x)
Compute next mean and std
- Parameters
x – float Input data.
- normalize(x)
tf2rl.misc.periodic_ops module
Periodic execution ops.
It is very common in Reinforcement Learning for certain ops to only need to be executed periodically, for example: once every N agent steps. The ops below support this common use-case by wrapping a subgraph as a periodic op that only actually executes the underlying computation once every N evaluations of the op, behaving as a no-op in all other calls.
- tf2rl.misc.periodic_ops.periodically(body, period, name='periodically')
Periodically performs a tensorflow op.
The body tensorflow op will be executed every period times the periodically op is executed. More specifically, with n the number of times the op has been executed, the body will be executed when n is a non zero positive multiple of period (i.e. there exist an integer k > 0 such that k * period == n).
If period is 0 or None, it would not perform any op and would return a tf.no_op().
- Parameters
(callable) (body) – callable that returns the tensorflow op to be performed every time an internal counter is divisible by the period. The op must have no output (for example, a tf.group()).
(int) (period) – inverse frequency with which to perform the op.
(str) (name) – name of the variable_scope.
- Raises
TypeError – if body is not a callable.
ValueError – if period is negative.
- Returns
An op that periodically performs the specified op.
tf2rl.misc.prepare_output_dir module
MIT License
Copyright (c) 2017 Preferred Networks, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- tf2rl.misc.prepare_output_dir.is_return_code_zero(args)
Return true if the given command’s return code is zero. All the messages to stdout or stderr are suppressed. forked from https://github.com/chainer/chainerrl/blob/master/chainerrl/misc/is_return_code_zero.py
- tf2rl.misc.prepare_output_dir.is_under_git_control()
Return true iff the current directory is under git control.
- tf2rl.misc.prepare_output_dir.prepare_output_dir(args, user_specified_dir=None, argv=None, time_format='%Y%m%dT%H%M%S.%f', suffix='')
Prepare a directory for outputting training results. An output directory, which ends with the current datetime string, is created. Then the following infomation is saved into the directory:
args.txt: command line arguments command.txt: command itself environ.txt: environmental variables
Additionally, if the current directory is under git control, the following information is saved:
git-head.txt: result of git rev-parse HEAD git-status.txt: result of git status git-log.txt: result of git log git-diff.txt: result of git diff
- Parameters
argparse.Namespace) (args (dict or) – Arguments to save
None) (argv (list or) – If str is specified, the output directory is created under that path. If not specified, it is created as a new temporary directory instead.
None) – The list of command line arguments passed to a script. If not specified, sys.argv is used instead.
(str) (time_format) – Format used to represent the current datetime. The
:param default format is the basic format of ISO 8601. :return: Path of the output directory created by this function (str).
tf2rl.misc.target_update_ops module
Tensorflow ops for updating target networks.
Tensorflow ops that are used to update a target network from a source network. This is used in agents such as DQN or DPG, which use a target network that changes more slowly than the online network, in order to improve stability.
- tf2rl.misc.target_update_ops.update_target_variables(target_variables, source_variables, tau=1.0, use_locking=False, name='update_target_variables')
Returns an op to update a list of target variables from source variables.
The update rule is: target_variable = (1 - tau) * target_variable + tau * source_variable.
- Parameters
target_variables – a list of the variables to be updated.
source_variables – a list of the variables used for the update.
tau – weight used to gate the update. The permitted range is 0 < tau <= 1, with small tau representing an incremental update, and tau == 1 representing a full update (that is, a straight copy).
use_locking – use tf.Variable.assign’s locking option when assigning source variable values to target variables.
name – sets the name_scope for this op.
- Raises
TypeError – when tau is not a Python float
ValueError – when tau is out of range, or the source and target variables have different numbers or shapes.
- Returns
An op that executes all the variable updates.
- tf2rl.misc.target_update_ops.periodic_target_update(target_variables, source_variables, update_period, tau=1.0, use_locking=False, name='periodic_target_update')
Returns an op to periodically update a list of target variables.
The update_target_variables op is executed every update_period executions of the periodic_target_update op.
The update rule is: target_variable = (1 - tau) * target_variable + tau * source_variable.
- Parameters
target_variables – a list of the variables to be updated.
source_variables – a list of the variables used for the update.
update_period – inverse frequency with which to apply the update.
tau – weight used to gate the update. The permitted range is 0 < tau <= 1, with small tau representing an incremental update, and tau == 1 representing a full update (that is, a straight copy).
use_locking – use tf.variable.Assign’s locking option when assigning source variable values to target variables.
name – sets the name_scope for this op.
- Returns
An op that periodically updates target_variables with source_variables.