Developer Guide#

The section addresses developers of Tianshou, providing information for both casual contributors and maintainers alike.

Python Virtual Environment#

Tianshou is built and managed by poetry.

The development environment uses Python 3.11.

To install all relevant requirements (as well as Tianshou itself in editable mode) you can simply call

poetry install --with dev

Important

Depending on your setup, you may need to create and activate an empty virtual environment using the right Python version beforehand. For instance, to do this with conda, use:

conda create -n tianshou python=3.11
conda activate tianshou

Code Style and Auto-Formatting#

When editing code in Tianshou, strive for local consistency, i.e. adhere to the style already present in the codebase.

Tianshou uses an auto-formatting for consistency. To apply it, call

poe format

To check whether your formatting is compliant without applying the auto-formatter, call

poe lint

Type Checking#

We use mypy to perform static type analysis. To check typing, run

poe type-check

Tests#

Running the Test Suite Locally#

Tianshou uses pytest. Tests are located in ./test.

To run the full set of tests locally, run

poe test

Determinism Tests#

We implemented determinism tests for Tianshou’s algorithms, which allow us to determine whether algorithms still compute exactly the same results even after large refactorings. These tests are applied by

  1. creating a behavior snapshot in the old code branch before the changes and then

  2. running the respective determinism test in the new branch to ensure that the behavior is the same.

Unfortunately, full determinism is difficult to achieve across different platforms and even different machines using the same platform an Python environment. Therefore, these tests are not carried out in the CI pipeline. Instead, it is up to the developer to run them locally and check the results whenever a change is made to the codebase that could affect algorithm behavior.

Technically, the two steps are handled by setting static flags in class AlgorithmDeterminismTest and then running either the full test suite or a specific determinism test (test_*_determinism, e.g. test_ddpg_determinism) in the two branches to be compared.

  1. On the old branch: (Temporarily) set ENABLED=True and FORCE_SNAPSHOT_UPDATE=True and run the test(s).

  2. On the new branch: (Temporarily) set ENABLED=True and FORCE_SNAPSHOT_UPDATE=False and run the test(s).

  3. Inspect the test results; find a summary in determinism_tests.log

Tests in CI (GitHub Actions)#

CI tests will extensively test Tianshou’s functionality in multiple environments.

In particular, we test

  • on Ubuntu (full functionality tested)

    • py_pinned: using the pinned development environment (Python 3.11, known versions of all dependencies)

    • py_latest: using a more recent Python version with the newest set of compatible dependencies (automatically resolved)

  • on Windows and macOS (core functionality tested)

Principle of Maximum Compatibility#

The idea behind testing with dynamically resolved dependencies is that we want to maximize the applicability of Tianshou: For important dependencies that could conflict with environments used by our users, we do not restrict the version of a dependency unless there is a known incompatibility.

If incompatibilities should arise (e.g. by the “py_latest” test failing), we either

  • resolve them by making the code compatible with both old and new versions OR

  • add an upper bound to our dependency declarations (excluding the incompatible versions) and release a new version of Tianshou to make these exclusions explicit.

High-Level API#

The high-level API provides a declarative, user-friendly interface for setting up reinforcement learning experiments. From a library developer’s perspective, it is important that this API be clearly structured and maintainable. This section explains the architectural principles and how to extend the API to support new algorithms.

Core Abstractions#

The high-level API is built around a clear separation of concerns:

Parameter Classes are dataclasses (inheriting from Params) that represent algorithm-specific configuration. They capture hyperparameters in a high-level, user-friendly form. Because the high-level interface must abstract away from low-level details, parameters may need transformation before being passed to policy classes. This is handled via ParamTransformer instances, which successively transform the parameter dictionary representation. To maintain clarity and reduce coupling, parameter transformers are co-located with the parameters they affect. The system uses inheritance and mixins extensively to reduce duplication while maintaining flexibility.

Factories embody the principle of declarative configuration. Because object creation may depend on other objects that don’t yet exist at configuration time (e.g., neural networks depend on environment properties), the API transitions from objects to factories. Key factory types include:

  • EnvFactory for creating training, test, and watch environments

  • AgentFactory as the central factory that creates policies, trainers, and collectors

  • Various specialized factories for optimizers, actors, critics, noise, distributions, learning rate schedulers, and policy wrappers

Algorithm Factories (subclasses of AlgorithmFactory) are the core components responsible for orchestrating the creation of all algorithm-specific objects. They handle the creation of neural network architectures, apply parameter transformations, instantiate policies, and create trainers with appropriate collectors. To support a new algorithm, this is the primary extension point.

Experiment Builders (subclasses of ExperimentBuilder) provide the user-facing interface following the builder pattern. They contain sensible defaults while allowing customization through fluent with_* methods. Builder mixins provide composable functionality for common patterns (e.g., actor/critic configuration), avoiding code duplication across algorithm-specific implementations.

Supporting a New Algorithm#

Extending the high-level API to support a new algorithm involves creating three main components:

Parameter Class: Define a dataclass in tianshou/highlevel/params/algorithm_params.py that inherits from appropriate base classes and mixins. The choice of base class depends on the algorithm’s architecture (actor-critic, single network, etc.) and learning paradigm (on-policy, off-policy). Override _get_param_transformers() to specify how high-level parameters should be transformed for the low-level policy API. Common transformers handle optimizer creation, noise instantiation, and environment-dependent parameter resolution.

Algorithm Factory: Implement a subclass of AlgorithmFactory in tianshou/highlevel/algorithm.py. In most cases, inherit from existing base factories like ActorCriticOnPolicyAlgorithmFactory, ActorCriticOffPolicyAlgorithmFactory, or DiscreteCriticOnlyOffPolicyAlgorithmFactory, which handle common creation patterns. The primary requirement is implementing _get_algorithm_class() to return the appropriate algorithm class. For algorithms with non-standard requirements, override _create_algorithm(), _create_kwargs(), etc. to customize the instantiation logic.

Experiment Builder: Add a builder class in tianshou/highlevel/experiment.py that inherits from OnPolicyExperimentBuilder or OffPolicyExperimentBuilder along with appropriate mixins. The mixins provide standard functionality for configuring actors and critics (single critic, dual critics, critic ensembles, parameter sharing patterns, etc.). The main responsibility is implementing _create_algorithm_factory() to instantiate the algorithm factory with appropriate parameters and network factories. Optionally provide with_* methods for algorithm-specific configuration.

Export the new classes in tianshou/highlevel/__init__.py to make them available to users.

Design Principles#

The architecture follows several key principles:

Separation of Concerns: Configuration is cleanly separated from implementation. The transformation system bridges these layers while maintaining independence.

Declarative Configuration: Factories enable a declarative style where experiments are defined by what should be created rather than imperative steps. This makes experiments easily serializable and reproducible.

Composition and Inheritance: Mixins and inheritance reduce code duplication. Common functionality is factored into reusable components while maintaining flexibility for algorithm-specific requirements.

Progressive Disclosure: The API provides sensible defaults for simple use cases while allowing deep customization when needed. Users can progress from simple configurations to advanced setups without fighting the abstractions.

Co-location: Related code is kept together. Parameter transformers are defined near the parameters they transform, maintaining clarity about dependencies and making the codebase easier to navigate.

Type Safety: Extensive use of generics and type hints ensures that type checkers can catch configuration errors at development time rather than runtime.

Documentation#

Documentation is in the docs/ directory, using Markdown (.md), ReStructuredText (.rst) and notebook files. index.rst is the main page.

API References are automatically generated by Sphinx according to the outlines under docs/api/ and should be modified when any code changes.

To compile documentation into webpage, run

poe doc-build

The generated webpages can subsequently be found in docs/_build and can be viewed with any browser.

Verifications#

We have several automated verification methods for documentation:

  1. pydocstyle (as part of ruff): tests all docstring under tianshou/;

  2. doc8 (as part of ruff): tests ReStructuredText format;

  3. sphinx spelling and test: test if there is any error/warning when generating front-end html documentation.

Creating a Release#

To release a new version on PyPI,

  • set the version to be released in tianshou/__init__.py and in pyproject.toml, creating a commit

  • tag the commit with the version (using the format v1.2.3)

  • push the commit (git push) and the tag (git push --tags)

  • create a new release on GitHub based on the tag; this will trigger the release job for PyPI.

In the past, we provided releases to conda-forge as well, but this is currently not maintained.