| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Hopper Medium v2 | Normalized Return100.1 | 43 | |
| Locomotion | D4RL Hopper medium-offline | Score40.77 | 36 | |
| Offline Reinforcement Learning | D4RL Hopper Medium-Replay | Reward100.7 | 30 | |
| Offline Reinforcement Learning | D4RL Hopper Med-Expert | Normalized Average Return113 | 21 | |
| Continuous Control | D4RL Hopper medium | Normalized Return103.6 | 19 | |
| Offline Reinforcement Learning | D4RL Hopper (Expert) | Mean Normalized Score113.2 | 16 | |
| Offline Behavior Distillation | D4RL Hopper (medium-expert) | Normalized Return107.3 | 8 | |
| Offline Behavior Distillation | D4RL Hopper medium | Normalized Return56.4 | 8 | |
| Offline Reinforcement Learning | D4RL Hopper Simultaneous Adversarial Corruption | Average Score24.8 | 8 | |
| Offline Reinforcement Learning | D4RL Hopper (Simultaneous Random Corruption) | Average Score28.83 | 8 | |
| Offline Reinforcement Learning | Stochastic D4RL Hopper Medium MuJoCo | Mean Return1,014 | 8 | |
| Offline Policy Evaluation | D4RL Hopper medium | RMSE8.5 | 7 | |
| Offline Inverse Reinforcement Learning | D4RL Hopper Medium-Expert v2 | Cumulative Reward3,366.23 | 4 | |
| Reinforcement Learning | D4RL Hopper short feet (medium) | Mean Return3,060 | 4 | |
| Reinforcement Learning | D4RL Hopper broken hips (medium) | Mean Return2,785 | 4 | |
| Reinforcement Learning | D4RL Hopper Med-Expert | D4RL Score1.0389 | 2 | |
| Reinforcement Learning | D4RL Hopper Medium | D4RL Score80.86 | 2 |