imitation_base#
Source code: tianshou/algorithm/imitation/imitation_base.py
- class ImitationTrainingStats(*, train_time: float = 0.0, smoothed_loss: dict = <factory>, loss: float = 0.0)[source]#
Bases:
TrainingStats- loss: float = 0.0#
- class ImitationPolicy(*, actor: Module, action_space: Space, observation_space: Space | None = None, action_scaling: bool = False, action_bound_method: Literal['clip', 'tanh'] | None = 'clip')[source]#
Bases:
Policy- Parameters:
actor – a model following the rules (s -> a)
action_space – the environment’s action_space.
observation_space – the environment’s observation space
action_scaling – flag indicating whether, for continuous action spaces, actions should be scaled from the standard neural network output range [-1, 1] to the environment’s action space range [action_space.low, action_space.high]. This applies to continuous action spaces only (gym.spaces.Box) and has no effect for discrete spaces. When enabled, policy outputs are expected to be in the normalized range [-1, 1] (after bounding), and are then linearly transformed to the actual required range. This improves neural network training stability, allows the same algorithm to work across environments with different action ranges, and standardizes exploration strategies. Should be disabled if the actor model already produces outputs in the correct range.
action_bound_method – the method used for bounding actions in continuous action spaces to the range [-1, 1] before scaling them to the environment’s action space (provided that action_scaling is enabled). This applies to continuous action spaces only (gym.spaces.Box) and should be set to None for discrete spaces. When set to “clip”, actions exceeding the [-1, 1] range are simply clipped to this range. When set to “tanh”, a hyperbolic tangent function is applied, which smoothly constrains outputs to [-1, 1] while preserving gradients. The choice of bounding method affects both training dynamics and exploration behavior. Clipping provides hard boundaries but may create plateau regions in the gradient landscape, while tanh provides smoother transitions but can compress sensitivity near the boundaries. Should be set to None if the actor model inherently produces bounded outputs. Typically used together with action_scaling=True.
- forward(batch: ObsBatchProtocol, state: dict | BatchProtocol | ndarray | None = None, **kwargs: Any) ModelOutputBatchProtocol[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class OffPolicyImitationLearning(*, policy: ImitationPolicy, optim: OptimizerFactory)[source]#
Bases:
OffPolicyAlgorithm[ImitationPolicy],ImitationLearningAlgorithmMixinImplementation of off-policy vanilla imitation learning.
- Parameters:
policy – the policy
optim – the optimizer factory
- class OfflineImitationLearning(*, policy: ImitationPolicy, optim: OptimizerFactory)[source]#
Bases:
OfflineAlgorithm[ImitationPolicy],ImitationLearningAlgorithmMixinImplementation of offline vanilla imitation learning.
- Parameters:
policy – the policy
optim – the optimizer factory