discrete#
Source code: tianshou/utils/net/discrete.py
- dist_fn_categorical_from_logits(logits: Tensor) Categorical[source]#
Default distribution function for categorical actors.
- class DiscreteActor(*, preprocess_net: ModuleWithVectorOutput, action_shape: Sequence[int] | int | int64, hidden_sizes: Sequence[int] = (), softmax_output: bool = True)[source]#
Bases:
AbstractDiscreteActorGeneric discrete actor which uses a preprocessing network to generate a latent representation which is subsequently passed to an MLP to compute the output.
For common output semantics, see
DiscreteActorInterface.- Parameters:
preprocess_net – the preprocessing network, which outputs a vector of a known dimension; typically an instance of
Net.action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
softmax_output – whether to apply a softmax layer over the last layer’s output.
- get_preprocess_net() ModuleWithVectorOutput[source]#
Returns the network component that is used for pre-processing, i.e. the component which produces a latent representation, which then is transformed into the final output. This is, therefore, the first part of the network which processes the input. For example, a CNN is often used in Atari examples.
We need this method to be able to share latent representation computations with other networks (e.g. critics) within an algorithm.
Actors that do not have a pre-processing stage can return nn.Identity() (see
RandomActorfor an example).
- forward(obs: Tensor | ndarray | BatchProtocol, state: T | None = None, info: dict[str, Any] | None = None) tuple[Tensor, T | None][source]#
Mapping: (s_B, …) -> action_values_BA, hidden_state_BH | None.
Returns a tensor representing the values of each action, i.e, of shape (n_actions, ) (see class docstring for more info on the meaning of that), and a hidden state (which may be None). If self.softmax_output is True, they are the probabilities for taking each action. Otherwise, they will be action values. The hidden state is only not None if a recurrent net is used as part of the learning algorithm.
- class DiscreteCritic(*, preprocess_net: ModuleWithVectorOutput, hidden_sizes: Sequence[int] = (), last_size: int = 1)[source]#
Bases:
ModuleWithVectorOutputSimple critic network for discrete action spaces.
- Parameters:
preprocess_net – the preprocessing network, which outputs a vector of a known dimension; typically an instance of
Net.hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
last_size – the output dimension of Critic network.
output_dim – the dimension of the output vector.
- forward(obs: Tensor | ndarray | BatchProtocol, state: T | None = None, info: dict[str, Any] | None = None) Tensor[source]#
Mapping: s_B -> V(s)_B.
- class CosineEmbeddingNetwork(num_cosines: int, embedding_dim: int)[source]#
Bases:
ModuleCosine embedding network for IQN. Convert a scalar in [0, 1] to a list of n-dim vectors.
- Parameters:
num_cosines – the number of cosines used for the embedding.
embedding_dim – the dimension of the embedding/output.
Note
From ku2482/fqf-iqn-qrdqn.pytorch /fqf_iqn_qrdqn/network.py .
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(taus: Tensor) Tensor[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ImplicitQuantileNetwork(*, preprocess_net: ModuleWithVectorOutput, action_shape: Sequence[int] | int | int64, hidden_sizes: Sequence[int] = (), num_cosines: int = 64)[source]#
Bases:
DiscreteCriticImplicit Quantile Network.
- Parameters:
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
num_cosines – the number of cosines to use for cosine embedding. Default to 64.
Note
Although this class inherits Critic, it is actually a quantile Q-Network with output shape (batch_size, action_dim, sample_size).
The second item of the first return value is tau vector.
- Parameters:
output_dim – the dimension of the output vector.
- class FractionProposalNetwork(num_fractions: int, embedding_dim: int)[source]#
Bases:
ModuleFraction proposal network for FQF.
- Parameters:
num_fractions – the number of factions to propose.
embedding_dim – the dimension of the embedding/input.
Note
Adapted from ku2482/fqf-iqn-qrdqn.pytorch /fqf_iqn_qrdqn/network.py .
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(obs_embeddings: Tensor) tuple[Tensor, Tensor, Tensor][source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class FullQuantileFunction(*, preprocess_net: ModuleWithVectorOutput, action_shape: Sequence[int] | int | int64, hidden_sizes: Sequence[int] = (), num_cosines: int = 64)[source]#
Bases:
ImplicitQuantileNetworkFull(y parameterized) Quantile Function.
- Parameters:
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
num_cosines – the number of cosines to use for cosine embedding. Default to 64.
Note
The first return value is a tuple of (quantiles, fractions, quantiles_tau), where fractions is a Batch(taus, tau_hats, entropies).
- Parameters:
output_dim – the dimension of the output vector.
- forward(obs: ndarray | Tensor, propose_model: FractionProposalNetwork, fractions: Batch | None = None, **kwargs: Any) tuple[Any, Tensor][source]#
Mapping: s -> Q(s, *).
- class NoisyLinear(in_features: int, out_features: int, noisy_std: float = 0.5)[source]#
Bases:
ModuleImplementation of Noisy Networks. arXiv:1706.10295.
- Parameters:
in_features – the number of input features.
out_features – the number of output features.
noisy_std – initial standard deviation of noisy linear layers.
Note
Adapted from ku2482/fqf-iqn-qrdqn.pytorch /fqf_iqn_qrdqn/network.py .
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: Tensor) Tensor[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class IntrinsicCuriosityModule(*, feature_net: Module, feature_dim: int, action_dim: int, hidden_sizes: Sequence[int] = ())[source]#
Bases:
ModuleImplementation of Intrinsic Curiosity Module. arXiv:1705.05363.
- Parameters:
feature_net – a self-defined feature_net which output a flattened hidden state.
feature_dim – input dimension of the feature net.
action_dim – dimension of the action space.
hidden_sizes – hidden layer sizes for forward and inverse models.
Initializes internal Module state, shared by both nn.Module and ScriptModule.