| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | LunarLander v2 | Final Return2,292 | 30 | |
| Reinforcement Learning | LunarLander | Average Episode Reward283.56 | 15 | |
| Reinforcement Learning | LunarLander v3 | Average Agent Reward289 | 14 | |
| Continuous-state and discrete-action control | LunarLander v3 | Average Reward232.9 | 13 | |
| Reinforcement Learning | LunarLander | Training Time (min)0.8 | 13 | |
| Control | LunarLander | Robustness Gap0.18 | 12 | |
| Black-box Optimization | LunarLander frozen noise v3 | Total Proposal Time (s)0 | 9 | |
| Continuous Control | LunarLander Nonmarkov v2 (test) | AUC@T107.6 | 9 | |
| Black-box Optimization | LunarLander natural noise v3 | Total Proposal Time (s)0.3 | 8 | |
| Reinforcement Learning | LunarLander v3 | Coefficient of Variation3.2 | 8 | |
| Reinforcement Learning | LunarLander v3 | Episodes to Threshold (Score 200)290 | 8 | |
| Reinforcement Learning | LunarLander classical control 1M steps | Return267.19 | 8 | |
| Offline Reinforcement Learning | LunarLander Gym (100k) | Returns107 | 6 | |
| Offline Reinforcement Learning | LunarLander Gym (10k) | Returns102 | 6 | |
| Offline Reinforcement Learning | LunarLander Gym (1k) | Returns97 | 6 | |
| Reinforcement Learning | LunarLander (LL) (test) | Average Undiscounted Reward241 | 6 | |
| Trajectory Ranking | LunarLander v2 | Average Reward207.13 | 6 | |
| Reinforcement Learning | LunarLander | Maximum Return260.6 | 5 | |
| Reinforcement Learning | LunarLander | Return166.3 | 3 | |
| Reinforcement Learning | LunarLander | Environment Episodes400,000 | 3 | |
| Meta-Reinforcement Learning | Lunarlander g | FLOPs (k)0.015 | 3 | |
| Reinforcement Learning | LunarLander v2 | Episodes to Target Reward750 | 2 | |
| Reinforcement Learning | lunarlander Sticky | AUC@T36,783,880.67 | 2 | |
| Reinforcement Learning | lunarlander Noisy | AUC @ T-25,766,227.01 | 2 | |
| Reinforcement Learning | lunarlander Clean | AUC@T42,642,395.61 | 2 |