tf2rl.misc package

Submodules

tf2rl.misc.discount_cumsum module

tf2rl.misc.discount_cumsum.discount_cumsum(x, discount)

Forked from rllab for computing discounted cumulative sums of vectors.

Parameters
  • x – np.ndarray or tf.Tensor Vector of inputs

  • discount – float Discount factor

Returns

[x0 + discount * x1 + discount^2 * x2, x1 + discount * x2, x2]

Return type

Discounted cumulative summation. If input is [x0, x1, x2], then the output is

tf2rl.misc.get_replay_buffer module

tf2rl.misc.get_replay_buffer.get_space_size(space)
tf2rl.misc.get_replay_buffer.get_default_rb_dict(size, env)
tf2rl.misc.get_replay_buffer.get_replay_buffer(policy, env, use_prioritized_rb=False, use_nstep_rb=False, n_step=1, size=None)

tf2rl.misc.huber_loss module

tf2rl.misc.huber_loss.huber_loss(x, delta=1.0)
Parameters
  • x – np.ndarray or tf.Tensor Values to compute the huber loss.

  • delta – float Positive floating point value. Represents the maximum possible gradient magnitude.

Returns: tf.Tensor

The huber loss.

tf2rl.misc.initialize_logger module

tf2rl.misc.initialize_logger.initialize_logger(logging_level=20, output_dir='results/', filename=None, save_log=True)

tf2rl.misc.normalizer module

class tf2rl.misc.normalizer.Normalizer(mean_only=False)

Bases: object

Normalize input data online. This is based on following: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

__init__(mean_only=False)
observe(x)

Compute next mean and std

Parameters

x – float Input data.

normalize(x)
class tf2rl.misc.normalizer.NormalizerNumpy

Bases: object

__init__()
observe(x)
normalize(x, update=False)
get_params()
set_params(n, mean, mean_diff, var)

tf2rl.misc.periodic_ops module

Periodic execution ops.

It is very common in Reinforcement Learning for certain ops to only need to be executed periodically, for example: once every N agent steps. The ops below support this common use-case by wrapping a subgraph as a periodic op that only actually executes the underlying computation once every N evaluations of the op, behaving as a no-op in all other calls.

tf2rl.misc.periodic_ops.periodically(body, period, name='periodically')

Periodically performs a tensorflow op.

The body tensorflow op will be executed every period times the periodically op is executed. More specifically, with n the number of times the op has been executed, the body will be executed when n is a non zero positive multiple of period (i.e. there exist an integer k > 0 such that k * period == n).

If period is 0 or None, it would not perform any op and would return a tf.no_op().

Parameters
  • (callable) (body) – callable that returns the tensorflow op to be performed every time an internal counter is divisible by the period. The op must have no output (for example, a tf.group()).

  • (int) (period) – inverse frequency with which to perform the op.

  • (str) (name) – name of the variable_scope.

Raises
  • TypeError – if body is not a callable.

  • ValueError – if period is negative.

Returns

An op that periodically performs the specified op.

tf2rl.misc.prepare_output_dir module

MIT License

Copyright (c) 2017 Preferred Networks, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

tf2rl.misc.prepare_output_dir.is_return_code_zero(args)

Return true if the given command’s return code is zero. All the messages to stdout or stderr are suppressed. forked from https://github.com/chainer/chainerrl/blob/master/chainerrl/misc/is_return_code_zero.py

tf2rl.misc.prepare_output_dir.is_under_git_control()

Return true iff the current directory is under git control.

tf2rl.misc.prepare_output_dir.prepare_output_dir(args, user_specified_dir=None, argv=None, time_format='%Y%m%dT%H%M%S.%f', suffix='')

Prepare a directory for outputting training results. An output directory, which ends with the current datetime string, is created. Then the following infomation is saved into the directory:

args.txt: command line arguments command.txt: command itself environ.txt: environmental variables

Additionally, if the current directory is under git control, the following information is saved:

git-head.txt: result of git rev-parse HEAD git-status.txt: result of git status git-log.txt: result of git log git-diff.txt: result of git diff

Parameters
  • argparse.Namespace) (args (dict or) – Arguments to save

  • None) (argv (list or) – If str is specified, the output directory is created under that path. If not specified, it is created as a new temporary directory instead.

  • None) – The list of command line arguments passed to a script. If not specified, sys.argv is used instead.

  • (str) (time_format) – Format used to represent the current datetime. The

:param default format is the basic format of ISO 8601. :return: Path of the output directory created by this function (str).

tf2rl.misc.target_update_ops module

Tensorflow ops for updating target networks.

Tensorflow ops that are used to update a target network from a source network. This is used in agents such as DQN or DPG, which use a target network that changes more slowly than the online network, in order to improve stability.

tf2rl.misc.target_update_ops.update_target_variables(target_variables, source_variables, tau=1.0, use_locking=False, name='update_target_variables')

Returns an op to update a list of target variables from source variables.

The update rule is: target_variable = (1 - tau) * target_variable + tau * source_variable.

Parameters
  • target_variables – a list of the variables to be updated.

  • source_variables – a list of the variables used for the update.

  • tau – weight used to gate the update. The permitted range is 0 < tau <= 1, with small tau representing an incremental update, and tau == 1 representing a full update (that is, a straight copy).

  • use_locking – use tf.Variable.assign’s locking option when assigning source variable values to target variables.

  • name – sets the name_scope for this op.

Raises
  • TypeError – when tau is not a Python float

  • ValueError – when tau is out of range, or the source and target variables have different numbers or shapes.

Returns

An op that executes all the variable updates.

tf2rl.misc.target_update_ops.periodic_target_update(target_variables, source_variables, update_period, tau=1.0, use_locking=False, name='periodic_target_update')

Returns an op to periodically update a list of target variables.

The update_target_variables op is executed every update_period executions of the periodic_target_update op.

The update rule is: target_variable = (1 - tau) * target_variable + tau * source_variable.

Parameters
  • target_variables – a list of the variables to be updated.

  • source_variables – a list of the variables used for the update.

  • update_period – inverse frequency with which to apply the update.

  • tau – weight used to gate the update. The permitted range is 0 < tau <= 1, with small tau representing an incremental update, and tau == 1 representing a full update (that is, a straight copy).

  • use_locking – use tf.variable.Assign’s locking option when assigning source variable values to target variables.

  • name – sets the name_scope for this op.

Returns

An op that periodically updates target_variables with source_variables.

Module contents