tf2rl.policies package

Submodules

class tf2rl.policies.tfp_categorical_actor.CategoricalActor(*args, **kwargs)

Bases: tensorflow.python.keras.engine.training.Model

__init__(state_shape, action_dim, units=(256, 256), hidden_activation='relu', name='CategoricalActor')

call(states, test=False)

Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Parameters

inputs – A tensor or list of tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a tensor or None (no mask).

Returns

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

compute_log_probs(states, actions)

Compute log probabilities of state-action pairs

Parameters

states – tf.Tensor Tensors of inputs to NN
actions – tf.Tensor Tensors of NOT one-hot vector. They will be converted to one-hot vector inside this function.

Returns

Log probabilities

class tf2rl.policies.tfp_categorical_actor.CategoricalActorCritic(*args, **kwargs)

call(states, test=False)

Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Parameters

inputs – A tensor or list of tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a tensor or None (no mask).

Returns

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

class tf2rl.policies.tfp_gaussian_actor.GaussianActor(*args, **kwargs)

Bases: tensorflow.python.keras.engine.training.Model

__init__(state_shape, action_dim, max_action, units=(256, 256), hidden_activation='relu', state_independent_std=False, squash=False, name='gaussian_policy')

call(states, test=False): Compute actions and log probabilities of the selected action