| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Hopper v5 | SAC+DBC(*) | Average Return3,732.5 | 93 | 4d ago | |
| Atari 2600 Games Breakdown | PPO with RUDDER | Avg Reward (baseline)1,399,753 | 52 | 4d ago | |
| Ant v5 | QVPO+DBC(*) | Average Return6,633.8 | 49 | 4d ago | |
| Atari 2600 MONTEZUMA'S REVENGE | Go-Explore | Score18,003,200 | 45 | 4d ago | |
| Walker2D v5 | TD3+DBC(*) | Average Return6,335.5 | 43 | 4d ago | |
| Halfcheetah v5 | Average Return13,996.2 | 43 | 4d ago | ||
| Walker | CT-SAC | Average Returns1,035.52 | 38 | 4d ago | |
| Humanoid | Open-Ended Neural Reward Functions | Zero-Shot Reward90,921,063 | 30 | 4d ago | |
| CartPole Pure | Average Reward (2/0.5)200 | 30 | 4d ago | ||
| MountainCar (Pure) | CQL | Avg Reward (gamma=0.01)-44.6 | 30 | 4d ago | |
| CartPole v1 (test) | Qualitatively measured policy discrepancy w/ β | Total Reward500 | 25 | 4d ago | |
| Trading | CT-SAC | Return37.72 | 24 | 4d ago | |
| Cheetah | CT-SAC | Return934.76 | 24 | 4d ago | |
| MuJoCo HumanoidStandup | TC-M2TD3 | Average Performance130,892 | 24 | 4d ago | |
| Atari 2600 Montezuma's Revenge ALE (test) | SND-V | Score14,973 | 24 | 4d ago | |
| Atari100k (test) | Human | Alien Score7,127.7 | 23 | 4d ago | |
| LunarLander v2 | Advantage-weighting | Final Return2,292 | 23 | 4d ago | |
| Atari 100K (test) | DIAMOND | Mean Score2.791 | 21 | 4d ago | |
| Procgen (test) | Sparse Masked Attention Policies | BigFish Return21.61 | 21 | 4d ago | |
| Atari 57 | LBC | Atlantis3,824,506.3 | 21 | 4d ago | |
| Atari 100k steps (test) | Median HNS0.809 | 20 | 4d ago | ||
| Atari 2600 57 games | Rainbow | Median Human-Normalized Score223 | 20 | 4d ago | |
| Atari 2600 Private Eye ALE (test) | SND-VIC | Score17,313 | 19 | 4d ago | |
| Atari 2600 Gravitar ALE (test) | SND-VIC | Score6,712 | 19 | 4d ago | |
| MuJoCo Swimmer v4 | AD-SAC | Normalized Performance271 | 18 | 4d ago |