| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Walker | Average Returns1,035.52 | 38 | |
| Locomotion | Walker IID (test) | Mean Episode Reward1,909 | 24 | |
| Locomotion Control | Walker sigma 0.1 (test) | Episode Reward1,909 | 24 | |
| Offline Reinforcement Learning | Walker Gym-MuJoCo Medium-Expert D4RL | Normalized Score111.6 | 18 | |
| Locomotion Control | Walker sigma 0.7 (test) | Episode Reward289 | 18 | |
| Locomotion Control | Walker sigma 0.5 (test) | Episode Reward518 | 18 | |
| Locomotion Control | Walker sigma 0.3 (test) | Episode Reward908 | 18 | |
| Offline Reinforcement Learning | Walker Medium Gym-MuJoCo D4RL | Normalized Score84.7 | 16 | |
| Reinforcement Learning | Walker fixed linear adversary | Average Performance5,256 | 12 | |
| Worst-case time-constrained reinforcement learning | Walker MuJoCo (test) | Normalized Worst-Case Reward1.69 | 12 | |
| Robust Reinforcement Learning | Walker fixed exponential adversary MuJoCo | Average Performance5,310 | 12 | |
| Continuous Control | Walker MuJoCo (test) | Worst-case Performance5,724 | 12 | |
| Robot Locomotion | Walker v1 (test) | Total Reward2,603.59 | 12 | |
| Hurdles | Walker robot | Average Return15.3 | 9 | |
| Actuator Inversion | Walker (Ceval-in) | AER849 | 8 | |
| Actuator Inversion | Walker C (train) | AER845 | 8 | |
| Robotic Control | Walker V | Average Return61,227 | 6 | |
| Robotic Control | Walker-P | Average Return1,123,176 | 6 | |
| Locomotion | Walker sigma=0.7 | Reward504 | 6 | |
| Locomotion | Walker sigma=0.5 | Reward743 | 6 | |
| Locomotion | Walker sigma=0.3 | Reward887 | 6 | |
| Robotic control optimization | Walker | Generations10 | 5 | |
| Reinforcement Learning | Walker-P | Time Cost3 | 5 | |
| Reinforcement Learning | walker 10-p | Normalized Return102 | 4 | |
| Reinforcement Learning | walker 2-p | Normalized Return130.6 | 4 |