random#
Source code: tianshou/policy/random.py
- class MARLRandomTrainingStats(*, train_time: float = 0.0, smoothed_loss: dict = <factory>)[source]#
Bases:
TrainingStats
- class MARLRandomPolicy(*, action_space: Space, observation_space: Space | None = None, action_scaling: bool = False, action_bound_method: Literal['clip', 'tanh'] | None = 'clip', lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#
Bases:
BasePolicy[TMARLRandomTrainingStats]A random agent used in multi-agent learning.
It randomly chooses an action from the legal action.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(batch: ObsBatchProtocol, state: dict | BatchProtocol | ndarray | None = None, **kwargs: Any) ActBatchProtocol[source]#
Compute the random action over the given batch data.
The input should contain a mask in batch.obs, with “True” to be available and “False” to be unavailable. For example,
batch.obs.mask == np.array([[False, True, False]])means with batch size 1, action “1” is available but action “0” and “2” are unavailable.- Returns:
A
Batchwith “act” key, containing the random action.
See also
Please refer to
forward()for more detailed explanation.
- learn(batch: RolloutBatchProtocol, *args: Any, **kwargs: Any) TMARLRandomTrainingStats[source]#
Since a random agent learns nothing, it returns an empty dict.