imitation_base

imitation_base#

Source code: tianshou/algorithm/imitation/imitation_base.py

class ImitationTrainingStats(*, train_time: float = 0.0, smoothed_loss: dict = <factory>, loss: float = 0.0)[source]#

Bases: TrainingStats

loss: float = 0.0#

class ImitationPolicy(*, actor: Module, action_space: Space, observation_space: Space | None = None, action_scaling: bool = False, action_bound_method: Literal['clip', 'tanh'] | None = 'clip')[source]#

Bases: Policy

Parameters:

actor – a model following the rules (s -> a)
action_space – the environment’s action_space.
observation_space – the environment’s observation space
action_scaling – flag indicating whether, for continuous action spaces, actions should be scaled from the standard neural network output range [-1, 1] to the environment’s action space range [action_space.low, action_space.high]. This applies to continuous action spaces only (gym.spaces.Box) and has no effect for discrete spaces. When enabled, policy outputs are expected to be in the normalized range [-1, 1] (after bounding), and are then linearly transformed to the actual required range. This improves neural network training stability, allows the same algorithm to work across environments with different action ranges, and standardizes exploration strategies. Should be disabled if the actor model already produces outputs in the correct range.
action_bound_method – the method used for bounding actions in continuous action spaces to the range [-1, 1] before scaling them to the environment’s action space (provided that action_scaling is enabled). This applies to continuous action spaces only (gym.spaces.Box) and should be set to None for discrete spaces. When set to “clip”, actions exceeding the [-1, 1] range are simply clipped to this range. When set to “tanh”, a hyperbolic tangent function is applied, which smoothly constrains outputs to [-1, 1] while preserving gradients. The choice of bounding method affects both training dynamics and exploration behavior. Clipping provides hard boundaries but may create plateau regions in the gradient landscape, while tanh provides smoother transitions but can compress sensitivity near the boundaries. Should be set to None if the actor model inherently produces bounded outputs. Typically used together with action_scaling=True.

forward(batch: ObsBatchProtocol, state: dict | BatchProtocol | ndarray | None = None, **kwargs: Any) → ModelOutputBatchProtocol[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class ImitationLearningAlgorithmMixin[source]#: Bases: object

class OffPolicyImitationLearning(*, policy: ImitationPolicy, optim: OptimizerFactory)[source]#

Bases: OffPolicyAlgorithm[ImitationPolicy], ImitationLearningAlgorithmMixin

Implementation of off-policy vanilla imitation learning.

Parameters:

policy – the policy
optim – the optimizer factory

class OfflineImitationLearning(*, policy: ImitationPolicy, optim: OptimizerFactory)[source]#

Bases: OfflineAlgorithm[ImitationPolicy], ImitationLearningAlgorithmMixin

Implementation of offline vanilla imitation learning.

Parameters:

policy – the policy
optim – the optimizer factory

imitation_base

Contents

imitation_base#