| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Humanoid | Zero-Shot Reward90,921,063 | 32 | |
| Reinforcement Learning | Humanoid v3 | Avg Final Return11,888 | 26 | |
| Humanoid Locomotion | Humanoid Randomized Task (OOD Sweep) | Reward-3.58 | 24 | |
| High-Dimensional Bayesian Optimization | Humanoid d = 6392 | Rank1 | 21 | |
| Continuous Control | Humanoid 17-Dof | Final Return13,860 | 21 | |
| Robot Locomotion | Humanoid | Cumulative Reward5,299 | 16 | |
| Continuous Control | Humanoid MuJoCo v2 (evaluation) | Action Performance (p_act=0.1)5,078.3 | 14 | |
| Continuous Control | Humanoid v5 | Average Return5,906.7 | 13 | |
| Reinforcement Learning | Humanoid (delta=[0.8^6, 0.5^6, 0.2^5], kappa=4.0) v5 (test) | Return5,620 | 12 | |
| Worst-case time-constrained reinforcement learning | Humanoid MuJoCo (test) | Normalized Worst-Case Reward4.02 | 12 | |
| Robot Locomotion | Humanoid v1 (test) | Total Score93,123.84 | 12 | |
| Reinforcement Learning | Humanoid v5 | Performance Score5,906.7 | 11 | |
| Locomotion | Humanoid v4 | Mean Episode Return7,365.7 | 10 | |
| Locomotion | Humanoid | Relative Return Improvement18.52 | 10 | |
| Reinforcement Learning | Humanoid v4 | Reward5,715 | 9 | |
| Black-box Optimization | Humanoid | Objective Value669.52 | 8 | |
| High-Dimensional Locomotion | Humanoid v4 (test) | Reward6,907.99 | 8 | |
| Reinforcement Learning | Humanoid v5 | Coefficient of Variation (%)6.3 | 8 | |
| Reinforcement Learning | Humanoid v5 | Average Returns5,228 | 8 | |
| Constrained Reinforcement Learning | Humanoid | Episodic Reward1,734.1 | 8 | |
| Reinforcement Learning | Humanoid gravity v2 | Average Return6,360 | 8 | |
| Trajectory Optimization | Humanoid Standup | Computational Time (s)17.6 | 8 | |
| Continuous Control | Humanoid v4 | Average Cumulative Reward4,978.5 | 7 | |
| Robotic Control | Humanoid v4 | Local Optima Escape Rate72.3 | 7 | |
| Continuous Control | Humanoid | Humanoid Return (p_act=0.1)680.1 | 7 |