| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Hopper v5 | SAC+DBC(*) | Average Return3,732.5 | 101 | 1mo ago | |
| LunarLanderContinuous v2 | Mean Reward533.6 | 59 | 17d ago | ||
| Ant v5 | QVPO+DBC(*) | Average Return6,633.8 | 57 | 1mo ago | |
| Atari 2600 Games Breakdown | PPO with RUDDER | Avg Reward (baseline)1,399,753 | 52 | 1mo ago | |
| MountainCarContinuous v0 | Average Agent Reward97 | 48 | 17d ago | ||
| CartPole v0 | CBRL | Mean Score200 | 48 | 17d ago | |
| Walker2D v5 | TD3+DBC(*) | Average Return6,335.5 | 45 | 15d ago | |
| Atari 2600 MONTEZUMA'S REVENGE | Go-Explore | Score18,003,200 | 45 | 1mo ago | |
| Halfcheetah v5 | Average Return13,996.2 | 43 | 1mo ago | ||
| Walker | CT-SAC | Average Returns1,035.52 | 38 | 1mo ago | |
| HalfCheetah v3 | DACER | Mean Reward17,177 | 34 | 11d ago | |
| Humanoid | Open-Ended Neural Reward Functions | Zero-Shot Reward90,921,063 | 30 | 1mo ago | |
| CartPole Pure | Average Reward (2/0.5)200 | 30 | 1mo ago | ||
| MountainCar (Pure) | CQL | Avg Reward (gamma=0.01)-44.6 | 30 | 1mo ago | |
| MuJoCo Half-Cheetah | SiMPO-Lin. Neg. | Average Return13,907 | 28 | 1mo ago | |
| InvertedPendulum v2 | TTOpt | Mean Reward1,000 | 27 | 11d ago | |
| Hopper v3 | DACER | Average Final Return4,104 | 26 | 11d ago | |
| Walker2d v3 | DACER | Average Final Return6,701 | 26 | 11d ago | |
| Ant v3 | DACER | Average Final Return9,108 | 26 | 11d ago | |
| Humanoid v3 | DACER | Avg Final Return11,888 | 26 | 11d ago | |
| Pendulum | TRPO | Avg Episode Reward-145.49 | 26 | 1mo ago | |
| CartPole v1 (test) | Qualitatively measured policy discrepancy w/ β | Total Reward500 | 25 | 1mo ago | |
| MiniHack Corridor-5 | PPO-RNN | Mean Return1 | 24 | 1mo ago | |
| Craftax Classic | PPO-RNN | Mean Return20.5 | 24 | 1mo ago | |
| Craftax | PPO-Hybrid | Mean Return48.8 | 24 | 1mo ago |