| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Walker2d Medium v2 | Normalized Return95.5 | 85 | |
| Offline Reinforcement Learning | D4RL walker2d medium-replay v2 | Normalized Score100.6 | 46 | |
| Offline Reinforcement Learning | D4RL Walker2D Expert | Mean Normalized Score117.4 | 38 | |
| Locomotion | D4RL Walker2d medium-offline | Normalized Score37.45 | 36 | |
| Locomotion Control | D4RL Walker2d medium-expert | Normalized Return111.2 | 23 | |
| Offline Reinforcement Learning | D4RL Walker2d Medium | Normalized Avg Return87.7 | 20 | |
| Continuous Control | D4RL Walker2d medium | Normalized Return81.9 | 14 | |
| Locomotion | D4RL Walker2d medium-replay v2 | Online Normalized Return131.58 | 12 | |
| Offline Reinforcement Learning | D4RL Walker2d-M | Normalized Return85 | 11 | |
| Offline Reinforcement Learning | D4RL Walker2d Med-Replay | Normalized Average Return82.6 | 11 | |
| MuJoCo Locomotion | D4RL walker2d medium-expert v2 | Score (Normalized)111.2 | 9 | |
| Offline Reinforcement Learning | D4RL Walker2d Full-Replay | Normalized Score107.8 | 8 | |
| Offline Behavior Distillation | D4RL Walker2D (medium-expert) | Normalized Return109 | 8 | |
| Offline Behavior Distillation | D4RL Walker2D medium | Normalized Return84 | 8 | |
| Offline Reinforcement Learning | D4RL Walker2d Simultaneous Random Corruption | Average Score23.62 | 8 | |
| Offline Reinforcement Learning | D4RL Walker2d Stochastic MuJoCo (Mixed) | Mean Return450 | 8 | |
| Offline Policy Evaluation | D4RL Walker2d medium | RMSE149 | 7 | |
| Offline Reinforcement Learning | D4RL Walker2d random v0 | Return412 | 6 | |
| Inverse Reinforcement Learning | D4RL Walker2d medium-expert | Return5,384 | 5 | |
| Continuous Control | D4RL Walker2d expert | Normalized Return125.7 | 5 | |
| Offline Inverse Reinforcement Learning | D4RL Walker2d Medium-Expert v2 | Cumulative Reward4,049.43 | 4 | |
| Offline Inverse Reinforcement Learning | D4RL Walker2d Medium v2 | Cumulative Reward4,121.68 | 4 | |
| Locomotion | D4RL Walker2d-expert | Normalized Score (100k Steps)120.35 | 3 | |
| Locomotion | D4RL walker2d medium v2 | Normalized Return102.9 | 2 | |
| Off-policy Evaluation | D4RL Walker2D-medium-expert | RMAE0.252 | 2 |