| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Walker2d Medium v2 | Normalized Return94.2 | 67 | |
| Locomotion | D4RL Walker2d medium-offline | Normalized Score37.45 | 36 | |
| Offline Reinforcement Learning | D4RL walker2d medium-replay v2 | Normalized Score100.6 | 36 | |
| Locomotion Control | D4RL Walker2d medium-expert | Normalized Return111.2 | 23 | |
| Offline Reinforcement Learning | D4RL Walker2D Expert | Mean Normalized Score117.4 | 22 | |
| Offline Reinforcement Learning | D4RL Walker2d Medium | Normalized Avg Return87.7 | 18 | |
| Continuous Control | D4RL Walker2d medium | Normalized Return81.9 | 14 | |
| Offline Reinforcement Learning | D4RL Walker2d Med-Replay | Normalized Average Return82.6 | 11 | |
| Offline Behavior Distillation | D4RL Walker2D (medium-expert) | Normalized Return109 | 8 | |
| Offline Behavior Distillation | D4RL Walker2D medium | Normalized Return84 | 8 | |
| Offline Reinforcement Learning | D4RL Walker2d Simultaneous Random Corruption | Average Score23.62 | 8 | |
| Offline Reinforcement Learning | D4RL Walker2d Stochastic MuJoCo (Mixed) | Mean Return450 | 8 | |
| Offline Policy Evaluation | D4RL Walker2d medium | RMSE149 | 7 | |
| Offline Reinforcement Learning | D4RL Walker2d random v0 | Return412 | 6 | |
| Continuous Control | D4RL Walker2d expert | Normalized Return125.7 | 5 | |
| Offline Inverse Reinforcement Learning | D4RL Walker2d Medium-Expert v2 | Cumulative Reward4,049.43 | 4 | |
| Offline Inverse Reinforcement Learning | D4RL Walker2d Medium v2 | Cumulative Reward4,121.68 | 4 | |
| Locomotion | D4RL Walker2d-expert | Normalized Score (100k Steps)120.35 | 3 | |
| Off-policy Evaluation | D4RL Walker2D-medium-expert | RMAE0.252 | 2 |