| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Hopper v5 | SAC+DBC(*) | Average Return3,732.5 | 101 | 2mo ago | |
| MountainCarContinuous v0 | R2PO | Average Agent Reward98.75 | 65 | 22d ago | |
| LunarLanderContinuous v2 | Mean Reward533.6 | 65 | 23d ago | ||
| Ant v5 | QVPO+DBC(*) | Average Return6,633.8 | 57 | 2mo ago | |
| Atari 2600 Games Breakdown | PPO with RUDDER | Avg Reward (baseline)1,399,753 | 52 | 3mo ago | |
| CartPole v0 | CBRL | Mean Score200 | 48 | 2mo ago | |
| Halfcheetah v5 | Average Return13,996.2 | 47 | 1mo ago | ||
| Walker2D v5 | TD3+DBC(*) | Average Return6,335.5 | 45 | 2mo ago | |
| Atari 2600 MONTEZUMA'S REVENGE | Go-Explore | Score18,003,200 | 45 | 3mo ago | |
| Acrobot v1 | Mean Return89.37 | 42 | 14d ago | ||
| Atari 100k | Alien Score7,128 | 41 | 15d ago | ||
| Walker | CT-SAC | Average Returns1,035.52 | 38 | 3mo ago | |
| HalfCheetah v3 | DACER | Mean Reward17,177 | 34 | 1mo ago | |
| Humanoid | Open-Ended Neural Reward Functions | Zero-Shot Reward90,921,063 | 32 | 22d ago | |
| Lunar Lander POMDP | VOMCPOW | Performance Score56.09 | 30 | 15d ago | |
| the Room (test) | Average Total Reward per Episode128 | 30 | 15d ago | ||
| CartPole Pure | Average Reward (2/0.5)200 | 30 | 3mo ago | ||
| MountainCar (Pure) | CQL | Avg Reward (gamma=0.01)-44.6 | 30 | 3mo ago | |
| LunarLander v2 | Advantage-weighting | Final Return2,292 | 30 | 14d ago | |
| CartPole | SALSA-RL | Average Reward1,000 | 29 | 22d ago | |
| MuJoCo Half-Cheetah | SiMPO-Lin. Neg. | Average Return13,907 | 28 | 2mo ago | |
| InvertedPendulum v2 | TTOpt | Mean Reward1,000 | 27 | 1mo ago | |
| Hopper v4 | pop-SAN | Average Return27,721,263 | 26 | 21d ago | |
| Hopper v3 | DACER | Average Final Return4,104 | 26 | 1mo ago | |
| Walker2d v3 | DACER | Average Final Return6,701 | 26 | 1mo ago |