torch_utils#


torch_train_mode(module: Module, enabled: bool = True) Iterator[None][source]#

Temporarily switch to module.training=enabled, affecting things like BatchNormalization.

policy_within_training_step(policy: BasePolicy, enabled: bool = True) Iterator[None][source]#

Temporarily switch to policy.is_within_training_step=enabled.

Enabling this ensures that the policy is able to adapt its behavior, allowing it to differentiate between training and inference/evaluation, e.g., to sample actions instead of using the most probable action (where applicable) Note that for rollout, which also happens within a training step, one would usually want the wrapped torch module to be in evaluation mode, which can be achieved using with torch_train_mode(policy, False). For subsequent gradient updates, the policy should be both within training step and in torch train mode.

create_uniform_action_dist(action_space: Box, batch_size: int = 1) Uniform[source]#
create_uniform_action_dist(action_space: Discrete, batch_size: int = 1) Categorical

Create a Distribution such that sampling from it is equivalent to sampling a batch with action_space.sample().

Parameters:
  • action_space – The action space of the environment.

  • batch_size – The number of environments or batch size for sampling.

Returns:

A PyTorch distribution for sampling actions.