| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | LunarLander v2 | Final Return2,292 | 23 | |
| Reinforcement Learning | LunarLander | Average Episode Reward283.56 | 10 | |
| Continuous Control | LunarLander Nonmarkov v2 (test) | AUC@T107.6 | 9 | |
| Reinforcement Learning | LunarLander classical control 1M steps | Return267.19 | 8 | |
| Trajectory Ranking | LunarLander v2 | Average Reward207.13 | 6 | |
| Reinforcement Learning | LunarLander | Environment Episodes400,000 | 3 | |
| Meta-Reinforcement Learning | Lunarlander g | FLOPs (k)0.015 | 3 | |
| Reinforcement Learning | LunarLander v3 | Average Agent Reward242.1 | 2 | |
| Reinforcement Learning | lunarlander Sticky | AUC@T36,783,880.67 | 2 | |
| Reinforcement Learning | lunarlander Noisy | AUC @ T-25,766,227.01 | 2 | |
| Reinforcement Learning | lunarlander Clean | AUC@T42,642,395.61 | 2 | |
| Reinforcement Learning | LunarLander standard (test) | Episode Length16.5 | 2 | |
| Interpretability Evaluation | LunarLander | Interpretability Score4 | 2 | |
| Stochastic Lipschitz Optimization | LunarLander | Regret7 | 1 | |
| Meta-Reinforcement Learning | LunarLander | Metric- | 0 | |
| Continuous Control | LunarLander Nonmarkov v2 | AUC@T- | 0 |