| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Hopper v5 | Average Return3,732.5 | 93 | |
| Offline Reinforcement Learning | hopper medium | Normalized Score3,729 | 52 | |
| Offline Reinforcement Learning | hopper medium-replay | Normalized Score113 | 44 | |
| Offline Reinforcement Learning | Hopper D4RL v2 (offline) | Average Score100.8 | 32 | |
| Offline Reinforcement Learning | Hopper medium-expert | Normalized Score111.6 | 24 | |
| Offline Reinforcement Learning | hopper Mixed Dataset | Normalized Reward108 | 24 | |
| Locomotion | Hopper IID (test) | Mean Episode Reward1,859 | 24 | |
| Locomotion Control | Hopper sigma 0.3 (test) | Episode Reward1,368 | 24 | |
| Locomotion | Hopper | Convergence (%)100 | 20 | |
| Offline Reinforcement Learning | Hopper expert | Normalized Score112.8 | 19 | |
| Continuous Control | Hopper 3-Dof | Final Return2,735 | 18 | |
| Locomotion Control | Hopper sigma 0.7 (test) | Episode Reward443 | 18 | |
| Locomotion Control | Hopper sigma 0.5 (test) | Episode Reward729 | 18 | |
| Locomotion Control | Hopper sigma 0.1 (test) | Episode Reward1,859 | 18 | |
| Offline Reinforcement Learning | Hopper kinematic shifts | Score97 | 16 | |
| Offline Reinforcement Learning | Hopper | Average Return2,116.2 | 16 | |
| Reinforcement Learning | Hopper | Avg Episode Reward2,743.9 | 15 | |
| Continuous Control | hopper | Average Reward2,133,326 | 15 | |
| Offline Reinforcement Learning | Hopper Medium Noise 0 | Normalized Return95 | 14 | |
| Offline Reinforcement Learning | Hopper Medium (Noise 5) | Normalized Return70.67 | 14 | |
| Cross-Domain Offline Policy Adaptation | hopper-med Source Target | Normalized Score41.6 | 14 | |
| Offline Policy Adaptation | hopper medium-expert | Normalized Score53.4 | 14 | |
| Offline Policy Adaptation | Hopper medium-replay | Normalized Score36.8 | 14 | |
| Offline Reinforcement Learning | Hopper random | Normalized Score32.2 | 14 | |
| Reinforcement Learning | Hopper v4 | Average Return27,721,263 | 13 |