| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Humanoid | Zero-Shot Reward90,921,063 | 30 | |
| Continuous Control | Humanoid 17-Dof | Final Return13,860 | 21 | |
| Robot Locomotion | Humanoid | Cumulative Reward5,299 | 16 | |
| Worst-case time-constrained reinforcement learning | Humanoid MuJoCo (test) | Normalized Worst-Case Reward4.02 | 12 | |
| Robot Locomotion | Humanoid v1 (test) | Total Score93,123.84 | 12 | |
| Reinforcement Learning | Humanoid v5 | Performance Score5,906.7 | 11 | |
| Constrained Reinforcement Learning | Humanoid | Episodic Reward1,734.1 | 8 | |
| Reinforcement Learning | Humanoid gravity v2 | Average Return6,360 | 8 | |
| Continuous Control | Humanoid v3 | Average Return4,963 | 7 | |
| Continuous Control | Humanoid v5 | Average Return5,906.7 | 7 | |
| Locomotion | Humanoid v3 | Average Return5,353.5 | 7 | |
| Reinforcement Learning | Humanoid v3 | Avg Final Return11,888 | 7 | |
| Reinforcement Learning | Humanoid v2 | Return8,048 | 7 | |
| Locomotion | Humanoid v2 | Average Return10,490 | 6 | |
| Locomotion | Humanoid Environment Faults v5 | Episodic Return198,257,932 | 5 | |
| Locomotion | Humanoid Dynamic Faults v5 | Episodic Return152,825,979 | 5 | |
| Locomotion | Humanoid Actuator Faults v5 | Episodic Return139,815,624 | 5 | |
| Motion in-betweening | Humanoid User Study (test) | Similar Score60.12 | 5 | |
| Continuous Locomotion | Humanoid | Ground-truth Reward275.06 | 5 | |
| Trajectory Optimization | Humanoid Standup | Computational Time (s)17.6 | 5 | |
| Continuous Control | Humanoid Mujoco 1000k steps (train) | Training Time (h)11.43 | 4 | |
| Continuous Control | Humanoid Mujoco 500k steps (train) | Time (h)5.72 | 4 | |
| Continuous Control | Humanoid Mujoco 300k steps (train) | Time (h)3.43 | 4 | |
| Locomotion Diversity Discovery | Humanoid visual input | Diversity Score0.71 | 3 | |
| Motion Imitation | Humanoid Spinkick | Normalized Return77 | 3 |