tf2rl.policies package
Submodules
tf2rl.policies.tfp_categorical_actor module
- class tf2rl.policies.tfp_categorical_actor.CategoricalActor(*args, **kwargs)
Bases:
tensorflow.python.keras.engine.training.Model
- __init__(state_shape, action_dim, units=(256, 256), hidden_activation='relu', name='CategoricalActor')
- compute_prob(states)
- call(states, test=False)
Calls the model on new inputs.
In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).
- Parameters
inputs – A tensor or list of tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a tensor or None (no mask).
- Returns
A tensor if there is a single output, or a list of tensors if there are more than one outputs.
- compute_entropy(states)
- compute_log_probs(states, actions)
Compute log probabilities of state-action pairs
- Parameters
states – tf.Tensor Tensors of inputs to NN
actions – tf.Tensor Tensors of NOT one-hot vector. They will be converted to one-hot vector inside this function.
- Returns
Log probabilities
- class tf2rl.policies.tfp_categorical_actor.CategoricalActorCritic(*args, **kwargs)
Bases:
tf2rl.policies.tfp_categorical_actor.CategoricalActor
- __init__(*args, **kwargs)
- call(states, test=False)
Calls the model on new inputs.
In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).
- Parameters
inputs – A tensor or list of tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a tensor or None (no mask).
- Returns
A tensor if there is a single output, or a list of tensors if there are more than one outputs.
tf2rl.policies.tfp_gaussian_actor module
- class tf2rl.policies.tfp_gaussian_actor.GaussianActor(*args, **kwargs)
Bases:
tensorflow.python.keras.engine.training.Model
- LOG_STD_CAP_MAX = 2
- LOG_STD_CAP_MIN = -20
- EPS = 1e-06
- __init__(state_shape, action_dim, max_action, units=(256, 256), hidden_activation='relu', state_independent_std=False, squash=False, name='gaussian_policy')
- call(states, test=False)
Compute actions and log probabilities of the selected action
- compute_log_probs(states, actions)
- compute_entropy(states)