Benchmark#
Mujoco Benchmark#
Tianshou’s Mujoco benchmark contains state-of-the-art results.
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to thu-ml/tianshou for source code and detailed results.
The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include TD3 paper, SAC paper, PPO paper, ACKTR paper, OpenAI Baselines and Spinning Up.
Task |
Ant |
HalfCheetah |
Hopper |
Walker2d |
Swimmer |
Humanoid |
Reacher |
IPendulum |
IDPendulum |
|
---|---|---|---|---|---|---|---|---|---|---|
DDPG |
Tianshou |
990.4 |
11718.7 |
2197.0 |
1400.6 |
144.1 |
177.3 |
-3.3 |
1000.0 |
8364.3 |
TD3 Paper |
1005.3 |
3305.6 |
2020.5 |
1843.6 |
/ |
/ |
-6.5 |
1000.0 |
9355.5 |
|
TD3 Paper (Our) |
888.8 |
8577.3 |
1860.0 |
3098.1 |
/ |
/ |
-4.0 |
1000.0 |
8370.0 |
|
Spinning Up |
~840 |
~11000 |
~1800 |
~1950 |
~137 |
/ |
/ |
/ |
/ |
|
TD3 |
Tianshou |
5116.4 |
10201.2 |
3472.2 |
3982.4 |
104.2 |
5189.5 |
-2.7 |
1000.0 |
9349.2 |
TD3 Paper |
4372.4 |
9637.0 |
3564.1 |
4682.8 |
/ |
/ |
-3.6 |
1000.0 |
9337.5 |
|
Spinning Up |
~3800 |
~9750 |
~2860 |
~4000 |
~78 |
/ |
/ |
/ |
/ |
|
SAC |
Tianshou |
5850.2 |
12138.8 |
3542.2 |
5007.0 |
44.4 |
5488.5 |
-2.6 |
1000.0 |
9359.5 |
SAC Paper |
~3720 |
~10400 |
~3370 |
~3740 |
/ |
~5200 |
/ |
/ |
/ |
|
TD3 Paper |
655.4 |
2347.2 |
2996.7 |
1283.7 |
/ |
/ |
-4.4 |
1000.0 |
8487.2 |
|
Spinning Up |
~3980 |
~11520 |
~3150 |
~4250 |
~41.7 |
/ |
/ |
/ |
/ |
|
A2C |
Tianshou |
3485.4 |
1829.9 |
1253.2 |
1091.6 |
36.6 |
1726.0 |
-6.7 |
1000.0 |
9257.7 |
PPO Paper |
/ |
~1000 |
~900 |
~850 |
~31 |
/ |
~-24 |
~1000 |
~7100 |
|
PPO Paper (TR) |
/ |
~930 |
~1220 |
~700 |
~36 |
/ |
~-27 |
~1000 |
~8100 |
|
PPO |
Tianshou |
3258.4 |
5783.9 |
2609.3 |
3588.5 |
66.7 |
787.1 |
-4.1 |
1000.0 |
9231.3 |
PPO Paper |
/ |
~1800 |
~2330 |
~3460 |
~108 |
/ |
~-7 |
~1000 |
~8000 |
|
TD3 Paper |
1083.2 |
1795.4 |
2164.7 |
3317.7 |
/ |
/ |
-6.2 |
1000.0 |
8977.9 |
|
OpenAI Baselines |
/ |
~1700 |
~2400 |
~3510 |
~111 |
/ |
~-6 |
~940 |
~7350 |
|
Spinning Up |
~650 |
~1670 |
~1850 |
~1230 |
~120 |
/ |
/ |
/ |
/ |
|
TRPO |
Tianshou |
2866.7 |
4471.2 |
2046.0 |
3826.7 |
40.9 |
810.1 |
-5.1 |
1000.0 |
8435.2 |
ACKTR paper |
~0 |
~400 |
~1400 |
~550 |
~40 |
/ |
-8 |
~1000 |
~800 |
|
PPO Paper |
/ |
~0 |
~2100 |
~1100 |
~121 |
/ |
~-115 |
~1000 |
~200 |
|
TD3 paper |
-75.9 |
-15.6 |
2471.3 |
2321.5 |
/ |
/ |
-111.4 |
985.4 |
205.9 |
|
OpenAI Baselines |
/ |
~1350 |
~2200 |
~2350 |
~95 |
/ |
~-5 |
~910 |
~7000 |
|
Spinning Up (TF) |
~150 |
~850 |
~1200 |
~600 |
~85 |
/ |
/ |
/ |
/ |
Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and up to 48 CPU cores (at most one CPU core for each thread).
Algorithm |
# of Envs |
1M timesteps |
Collecting (%) |
Updating (%) |
Evaluating (%) |
Others (%) |
---|---|---|---|---|---|---|
DDPG |
1 |
2.9h |
12.0 |
80.2 |
2.4 |
5.4 |
TD3 |
1 |
3.3h |
11.4 |
81.7 |
1.7 |
5.2 |
SAC |
1 |
5.2h |
10.9 |
83.8 |
1.8 |
3.5 |
REINFORCE |
64 |
4min |
84.9 |
1.8 |
12.5 |
0.8 |
A2C |
16 |
7min |
62.5 |
28.0 |
6.6 |
2.9 |
PPO |
64 |
24min |
11.4 |
85.3 |
3.2 |
0.2 |
NPG |
16 |
7min |
65.1 |
24.9 |
9.5 |
0.6 |
TRPO |
16 |
7min |
62.9 |
26.5 |
10.1 |
0.6 |
Atari Benchmark#
Tianshou also provides reliable and reproducible Atari 10M benchmark.
Every experiment is conducted under 10 random seeds for 10M steps. Please refer to thu-ml/tianshou for source code and refer to https://wandb.ai/tianshou/atari.benchmark/reports/Atari-Benchmark–VmlldzoxOTA1NzA5 for detailed results hosted on wandb.
The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric (to be consistent with Mujoco). /
means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include Google Dopamine and OpenAI Baselines.
Task |
Pong |
Breakout |
Enduro |
Qbert |
MsPacman |
Seaquest |
SpaceInvaders |
|
---|---|---|---|---|---|---|---|---|
DQN |
Tianshou |
20.2 ± 2.3 |
133.5 ± 44.6 |
997.9 ± 180.6 |
11620.2 ± 786.1 |
2324.8 ± 359.8 |
3213.9 ± 381.6 |
947.9 ± 155.3 |
Dopamine |
9.8 |
92.2 |
2126.9 |
6836.7 |
2451.3 |
1406.6 |
1559.1 |
|
OpenAI Baselines |
16.5 |
131.5 |
479.8 |
3254.8 |
/ |
1164.1 |
1129.5 ± 145.3 |
|
C51 |
Tianshou |
20.6 ± 2.4 |
412.9 ± 35.8 |
940.8 ± 133.9 |
12513.2 ± 1274.6 |
2254.9 ± 201.2 |
3305.4 ± 1524.3 |
557.3 |
Dopamine |
17.4 |
222.4 |
665.3 |
9924.5 |
2860.4 |
1706.6 |
604.6 ± 157.5 |
|
Rainbow |
Tianshou |
20.2 ± 3.0 |
440.4 ± 50.1 |
1496.1 ± 112.3 |
14224.8 ± 1230.1 |
2524.2 ± 338.8 |
1934.6 ± 376.4 |
1178.4 |
Dopamine |
19.1 |
47.9 |
2185.1 |
15682.2 |
3161.7 |
3328.9 |
459.9 |
|
IQN |
Tianshou |
20.7 ± 2.9 |
355.9 ± 22.7 |
1252.7 ± 118.1 |
14409.2 ± 808.6 |
2228.6 ± 253.1 |
5341.2 ± 670.2 |
667.8 ± 81.5 |
Dopamine |
19.6 |
96.3 |
1227.6 |
12496.7 |
4422.7 |
16418 |
1358.2 ± 267.6 |
|
PPO |
Tianshou |
20.3 ± 1.2 |
283.0 ± 74.3 |
1098.9 ± 110.5 |
12341.8 ± 1760.7 |
1699.4 ± 248.0 |
1035.2 ± 353.6 |
1641.3 |
OpenAI Baselines |
13.7 |
114.3 |
350.2 |
7012.1 |
/ |
1218.9 |
1787.5 ± 340.8 |
|
QR-DQN |
Tianshou |
20.7 ± 2.0 |
228.3 ± 27.3 |
951.7 ± 333.5 |
14761.5 ± 862.9 |
2259.3 ± 269.2 |
4187.6 ± 725.7 |
1114.7 ± 116.9 |
FQF |
Tianshou |
20.4 ± 2.5 |
382.6 ± 29.5 |
1816.8 ± 314.3 |
15301.2 ± 684.1 |
2506.6 ± 402.5 |
8051.5 ± 3155.6 |
2558.3 |
Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is “better”. The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou’s reliability.