iqn

iqn#

Source code: tianshou/policy/modelfree/iqn.py

class IQNPolicy(*, model: Module, optim: Optimizer, action_space: Discrete, discount_factor: float = 0.99, sample_size: int = 32, online_sample_size: int = 8, target_sample_size: int = 8, num_quantiles: int = 200, estimation_step: int = 1, target_update_freq: int = 0, reward_normalization: bool = False, is_double: bool = True, clip_loss_grad: bool = False, observation_space: Space | None = None, lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#

Implementation of Implicit Quantile Network. arXiv:1806.06923.

Parameters:

model – a model following the rules in BasePolicy. (s -> logits)
optim – a torch.optim for optimizing the model.
discount_factor – in [0, 1].
sample_size – the number of samples for policy evaluation.
online_sample_size – the number of samples for online model in training.
target_sample_size – the number of samples for target model in training.
num_quantiles – the number of quantile midpoints in the inverse cumulative distribution function of the value.
estimation_step – the number of steps to look ahead.
target_update_freq – the target network update frequency (0 if you do not use the target network).
reward_normalization – normalize the returns to Normal(0, 1). TODO: rename to return_normalization?
is_double – use double dqn.
clip_loss_grad – clip the gradient of the loss in accordance with nature14236; this amounts to using the Huber loss instead of the MSE loss.
observation_space – Env’s observation space.
lr_scheduler –
if not None, will be called in policy.update().

Please refer to QRDQNPolicy for more detailed explanation.

forward(batch: ObsBatchProtocol, state: dict | BatchProtocol | ndarray | None = None, model: Literal['model', 'model_old'] = 'model', **kwargs: Any) → QuantileRegressionBatchProtocol[source]#

Compute action over the given batch data.

If you need to mask the action, please add a “mask” into batch.obs, for example, if we have an environment that has “0/1/2” three actions:

batch == Batch(
    obs=Batch(
        obs="original obs, with batch_size=1 for demonstration",
        mask=np.array([[False, True, False]]),
        # action 1 is available
        # action 0 and 2 are unavailable
    ),
    ...
)

Returns:

A Batch which has 3 keys:

act the action.
logits the network’s raw output.
state the hidden state.

iqn

Contents

iqn#