| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL MuJoCo Walker2d-mr v2 (medium-replay) | Average Normalized Score95.6 | 29 | |
| Offline Reinforcement Learning | D4RL MuJoCo Hopper-mr v2 (medium-replay) | Avg Normalized Score104.4 | 29 | |
| Offline Reinforcement Learning | D4RL MuJoCo Walker2d medium-expert v2 | Average Normalized Score117.2 | 24 | |
| Offline Reinforcement Learning | D4RL MuJoCo Halfcheetah-mr v2 (medium-replay) | Avg Normalized Score72.1 | 24 | |
| Offline Reinforcement Learning | D4RL MuJoCo Hopper-m v2 (medium) | Avg Normalized Score107.4 | 24 | |
| Offline Reinforcement Learning | D4RL Mujoco Hopper-Medium-Expert v2 | Normalized Score111.9 | 22 | |
| Offline Reinforcement Learning | D4RL Mujoco Halfcheetah-Medium-Expert v2 | Normalized Score94.3 | 17 | |
| Offline Reinforcement Learning | D4RL MuJoCo Walker2d-e v2 (expert) | Normalized Score110.2 | 14 | |
| Offline Reinforcement Learning | D4RL MuJoCo Walker2d-m medium v2 | Average Normalized Score86.8 | 14 | |
| Offline Reinforcement Learning | D4RL MuJoCo Halfcheetah-m v2 (medium) | Average Normalized Score48.3 | 14 | |
| Offline Reinforcement Learning | D4RL MuJoCo Hopper-e v2 (expert) | Average Normalized Score110 | 14 | |
| Offline Reinforcement Learning | D4RL MuJoCo hopper-medium-expert | Normalized Score111.2 | 13 | |
| Offline Reinforcement Learning | D4RL MuJoCo Locomotion Domain v2 | Return (HalfCheetah, M-E)96 | 10 | |
| Offline Reinforcement Learning | D4RL MuJoCo v2 (test) | HalfCheetah-Medium Score77.6 | 10 | |
| Offline Reinforcement Learning | D4RL MuJoCo walker2d-medium-expert v0 | Normalized Score108.4 | 8 | |
| Offline Reinforcement Learning | D4RL MuJoCo hopper-medium-expert v0 | Normalized Avg Score112.7 | 8 | |
| Offline Reinforcement Learning | D4RL MuJoCo hopper-medium v0 | Avg Score (Normalized)100.4 | 8 | |
| Offline Reinforcement Learning | D4RL MuJoCo hopper-random v0 | Normalized Score11.9 | 8 | |
| Cross-domain Offline Imitation Learning from Demonstrations (C-off-LfD) | D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert) | Hopper-v2 Return (medium)58.4 | 7 | |
| Single-domain Offline Imitation Learning from Demonstrations (S-off-LfD) | D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert) | Hopper-v2 (m) Score110.4 | 7 | |
| Locomotion | D4RL MuJoCo hopper-medium-replay v2 | Normalized Score98.94 | 4 | |
| Offline Reinforcement Learning | D4RL MuJoCo v2 | Ant Return (Random)31.62 | 4 | |
| Cross-domain Offline Imitation Learning from Observations (C-off-LfO) | D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert) | Hopper v2 (m) Score55.5 | 3 | |
| Single-domain Offline Imitation Learning from Observations (S-off-LfO) | D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert) | Hopper-v2 (m)54.5 | 3 | |
| Offline Reinforcement Learning | D4RL Mujoco Halfcheetah-Medium v2 | Normalized Score42.3 | 3 |