| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Humanoid | Zero-Shot Reward90,921,063 | 30 | |
| Reinforcement Learning | Humanoid v3 | Avg Final Return11,888 | 26 | |
| Humanoid Locomotion | Humanoid Randomized Task (OOD Sweep) | Reward-3.58 | 24 | |
| Continuous Control | Humanoid 17-Dof | Final Return13,860 | 21 | |
| Robot Locomotion | Humanoid | Cumulative Reward5,299 | 16 | |
| Continuous Control | Humanoid MuJoCo v2 (evaluation) | Action Performance (p_act=0.1)5,078.3 | 14 | |
| Continuous Control | Humanoid v5 | Average Return5,906.7 | 13 | |
| Worst-case time-constrained reinforcement learning | Humanoid MuJoCo (test) | Normalized Worst-Case Reward4.02 | 12 | |
| Robot Locomotion | Humanoid v1 (test) | Total Score93,123.84 | 12 | |
| Reinforcement Learning | Humanoid v5 | Performance Score5,906.7 | 11 | |
| Locomotion | Humanoid | Relative Return Improvement18.52 | 10 | |
| Reinforcement Learning | Humanoid v5 | Coefficient of Variation (%)6.3 | 8 | |
| Reinforcement Learning | Humanoid v5 | Average Returns5,228 | 8 | |
| Constrained Reinforcement Learning | Humanoid | Episodic Reward1,734.1 | 8 | |
| Reinforcement Learning | Humanoid gravity v2 | Average Return6,360 | 8 | |
| Continuous Control | Humanoid v4 | Average Cumulative Reward4,978.5 | 7 | |
| Robotic Control | Humanoid v4 | Local Optima Escape Rate72.3 | 7 | |
| Continuous Control | Humanoid | Humanoid Return (p_act=0.1)680.1 | 7 | |
| Continuous Control | Humanoid v3 | Average Return4,963 | 7 | |
| Locomotion | Humanoid v3 | Average Return5,353.5 | 7 | |
| Reinforcement Learning | Humanoid v2 | Return8,048 | 7 | |
| Locomotion | Humanoid v2 | Average Return10,490 | 6 | |
| Locomotion | Humanoid Environment Faults v5 | Episodic Return198,257,932 | 5 | |
| Locomotion | Humanoid Dynamic Faults v5 | Episodic Return152,825,979 | 5 | |
| Locomotion | Humanoid Actuator Faults v5 | Episodic Return139,815,624 | 5 |