torch_utils#
Source code: tianshou/utils/torch_utils.py
- torch_train_mode(module: Module, enabled: bool = True) Iterator[None] [source]#
Temporarily switch to module.training=enabled, affecting things like BatchNormalization.
- policy_within_training_step(policy: BasePolicy, enabled: bool = True) Iterator[None] [source]#
Temporarily switch to policy.is_within_training_step=enabled.
Enabling this ensures that the policy is able to adapt its behavior, allowing it to differentiate between training and inference/evaluation, e.g., to sample actions instead of using the most probable action (where applicable) Note that for rollout, which also happens within a training step, one would usually want the wrapped torch module to be in evaluation mode, which can be achieved using with torch_train_mode(policy, False). For subsequent gradient updates, the policy should be both within training step and in torch train mode.
- create_uniform_action_dist(action_space: Box, batch_size: int = 1) Uniform [source]#
- create_uniform_action_dist(action_space: Discrete, batch_size: int = 1) Categorical
Create a Distribution such that sampling from it is equivalent to sampling a batch with action_space.sample().
- Parameters:
action_space – The action space of the environment.
batch_size – The number of environments or batch size for sampling.
- Returns:
A PyTorch distribution for sampling actions.