torch_utils#


torch_train_mode(module: Module, enabled: bool = True) Iterator[None][source]#

Temporarily switch to module.training=enabled, affecting things like BatchNormalization.

policy_within_training_step(policy: Policy, enabled: bool = True) Iterator[None][source]#

Temporarily switch to policy.is_within_training_step=enabled.

Enabling this ensures that the policy is able to adapt its behavior, allowing it to differentiate between training and inference/evaluation, e.g., to sample actions instead of using the most probable action (where applicable) Note that for rollout, which also happens within a training step, one would usually want the wrapped torch module to be in evaluation mode, which can be achieved using with torch_train_mode(policy, False). For subsequent gradient updates, the policy should be both within training step and in torch train mode.

create_uniform_action_dist(action_space: Box, batch_size: int = 1) Uniform[source]#
create_uniform_action_dist(action_space: Discrete, batch_size: int = 1) Categorical

Create a Distribution such that sampling from it is equivalent to sampling a batch with action_space.sample().

Parameters:
  • action_space – the environment’s action_space.

  • batch_size – The number of environments or batch size for sampling.

Returns:

A PyTorch distribution for sampling actions.

torch_device(module: Module) device[source]#

Gets the device of a torch module by retrieving the device of the parameters.

If parameters are empty, it returns the CPU device as a fallback.