tf2rl.envs package
Submodules
tf2rl.envs.atari_wrapper module
The MIT License
Copyright (c) 2017 OpenAI (http://openai.com)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- class tf2rl.envs.atari_wrapper.NoopResetEnv(env, noop_max=30)
Bases:
gym.core.Wrapper
- __init__(env, noop_max=30)
Sample initial states by taking random number of no-ops on reset. No-op is assumed to be action 0.
- reset(**kwargs)
Do no-op action for a number of steps in [1, noop_max].
- step(ac)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class tf2rl.envs.atari_wrapper.FireResetEnv(env)
Bases:
gym.core.Wrapper
- __init__(env)
Take action on reset for environments that are fixed until firing.
- reset(**kwargs)
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(ac)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class tf2rl.envs.atari_wrapper.EpisodicLifeEnv(env)
Bases:
gym.core.Wrapper
- __init__(env)
Make end-of-life == end-of-episode, but only reset on true game over. Done by DeepMind for the DQN and co. since it helps value estimation.
- step(action)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- reset(**kwargs)
Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.
- class tf2rl.envs.atari_wrapper.MaxAndSkipEnv(env, skip=4)
Bases:
gym.core.Wrapper
- __init__(env, skip=4)
Return only every skip-th frame
- step(action)
Repeat action, sum reward, and max over last observations.
- reset(**kwargs)
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- class tf2rl.envs.atari_wrapper.ClipRewardEnv(env)
Bases:
gym.core.RewardWrapper
- __init__(env)
- reward(reward)
Bin reward to {+1, 0, -1} by its sign.
- class tf2rl.envs.atari_wrapper.WarpFrame(env, width=84, height=84, grayscale=True, dict_space_key=None)
Bases:
gym.core.ObservationWrapper
- __init__(env, width=84, height=84, grayscale=True, dict_space_key=None)
Warp frames to 84x84 as done in the Nature paper and later work. If the environment uses dictionary observations, dict_space_key can be specified which indicates which observation should be warped.
- observation(obs)
- class tf2rl.envs.atari_wrapper.ProcessFrame84(env=None)
Bases:
gym.core.ObservationWrapper
- __init__(env=None)
- observation(obs)
- static process(frame)
- class tf2rl.envs.atari_wrapper.FrameStack(env, k)
Bases:
gym.core.Wrapper
- __init__(env, k)
Stack k last frames. Returns lazy array, which is much more memory efficient. See also baselines.common.atari_wrappers.LazyFrames
- reset()
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class tf2rl.envs.atari_wrapper.ScaledFloatFrame(env)
Bases:
gym.core.ObservationWrapper
- __init__(env)
- observation(observation)
- class tf2rl.envs.atari_wrapper.LazyFrames(frames)
Bases:
object
- __init__(frames)
This object ensures that common frames between the observations are only stored once. It exists purely to optimize memory usage which can be huge for DQN’s 1M frames replay buffers. This object should only be converted to numpy array before being passed to the model. You’d not believe how complex the previous solution was.
- class tf2rl.envs.atari_wrapper.NdarrayFrames(env)
Bases:
gym.core.Wrapper
- __init__(env)
- reset()
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- tf2rl.envs.atari_wrapper.make_atari(env_id, max_episode_steps=None)
- tf2rl.envs.atari_wrapper.wrap_deepmind(env, episode_life=True, clip_rewards=True, frame_stack=False, scale=False)
Configure environment for DeepMind-style Atari.
- tf2rl.envs.atari_wrapper.wrap_dqn(env, stack_frames=4, episodic_life=True, reward_clipping=True, wrap_ndarray=False)
Apply a common set of wrappers for Atari games.
tf2rl.envs.dmc_wrapper module
- class tf2rl.envs.dmc_wrapper.DMCWrapper(env, k, obs_shape, wait_ms=33.333333333333336, **kwargs)
Bases:
tf2rl.envs.frame_stack_wrapper.FrameStack
Wrapper class to visualize DMC environments.
- __init__(env, k, obs_shape, wait_ms=33.333333333333336, **kwargs)
- render()
Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Parameters
mode (str) – the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
tf2rl.envs.env_utils module
- tf2rl.envs.env_utils.get_act_dim(env)
tf2rl.envs.frame_stack_wrapper module
- class tf2rl.envs.frame_stack_wrapper.FrameStack(env, k, obs_shape, channel_first=False)
Bases:
gym.core.Wrapper
- __init__(env, k, obs_shape, channel_first=False)
- reset()
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
tf2rl.envs.multi_thread_env module
- class tf2rl.envs.multi_thread_env.MultiThreadEnv(env_fn, batch_size, thread_pool=4, max_episode_steps=1000)
Bases:
object
This contains multiple environments. When step() is called, all of them forward one-step.
This serve tensorflow operators to manipulate multiple environments.
- __init__(env_fn, batch_size, thread_pool=4, max_episode_steps=1000)
- Parameters
env_fn – function Function to make an environment
batch_size – int Batch size
thread_pool – int Thread pool size
max_episode_steps – int Maximum step of an episode
- property original_env
- step(actions, name=None)
- Parameters
actions – tf.Tensor Actions whose shape is float32[batch_size, dim_action]
name – str Operator name
- Returns
- tf.Tensor
[batch_size, dim_obs]
- reward: tf.Tensor
[batch_size]
- done: tf.Tensor
[batch_size]
env_info: None
- Return type
obs
- py_step(actions)
- Parameters
actions – np.array Actions whose shape is [batch_size, dim_action]
- Returns
np.array reward: np.array done: np.array
- Return type
obs
- py_observation()
- py_reset()
- property max_action
- property min_action
- property state_dim
tf2rl.envs.normalizer module
MIT License
Copyright (c) 2017 Preferred Networks, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- class tf2rl.envs.normalizer.EmpiricalNormalizer(shape, batch_axis=0, eps=0.01, dtype=<class 'numpy.float32'>, until=None, clip_threshold=None)
Bases:
object
Normalize mean and variance of values based on emprical values. :param shape: Shape of input values except batch axis. :type shape: int or tuple of int :param batch_axis: Batch axis. :type batch_axis: int :param eps: Small value for stability. :type eps: float :param dtype: Dtype of input values. :type dtype: dtype :param until: If this arg is specified, the link learns input
values until the sum of batch sizes exceeds it.
- __init__(shape, batch_axis=0, eps=0.01, dtype=<class 'numpy.float32'>, until=None, clip_threshold=None)
- property mean
- property std
- experience(x)
Learn input values without computing the output values of them
- inverse(y)
tf2rl.envs.utils module
- tf2rl.envs.utils.is_discrete(space)
- tf2rl.envs.utils.get_act_dim(action_space)
- tf2rl.envs.utils.is_mujoco_env(env)
- tf2rl.envs.utils.is_atari_env(env)
- tf2rl.envs.utils.make(id, **kwargs)
Make gym.Env with version tolerance
- Parameters
id (str) – Id specifying gym.Env registered to gym.env.registry. Valid format is “^(?:[w:-]+/)?([w:.-]+)-v(d+)$” See https://github.com/openai/gym/blob/v0.21.0/gym/envs/registration.py#L17-L19
- Returns
Environment
- Return type
gym.Env