| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Target Acquisition | Reacher Quadrant (test) | Target Hit Rate (per min)19.7 | 15 | |
| Target Acquisition | Reacher Linear (test) | Hit Rate (per minute)20 | 15 | |
| Target Acquisition | Reacher Continuous (test) | Hit Rate (per min)18.8 | 15 | |
| Reinforcement Learning | Reacher | Average Return-4.1 | 12 | |
| Reaching | Reacher 3D | Success Rate95.1 | 10 | |
| Continuous Control | reacher | Average Reward0.72 | 9 | |
| Continuous Control | Reacher v5 | Average Episodic Reward-3.9 | 8 | |
| Actuator Inversion | Reacher H (eval-in) | AER291 | 8 | |
| Actuator Inversion | Reacher E (Ceval-in) | AER582 | 8 | |
| Actuator Inversion | Reacher H (train) | AER290 | 8 | |
| Actuator Inversion | Reacher E C (train) | AER584 | 8 | |
| Control | Reacher v2 | Median Samples251 | 8 | |
| Continuous Control | Reacher v2 | Average Return-4 | 7 | |
| Off-dynamics Reinforcement Learning | Reacher 0.5 density v1 (test) | Reward-11.7 | 7 | |
| Off-dynamics Reinforcement Learning | Reacher broken source environment MuJoCo | Average Reward30 | 7 | |
| Reinforcement Learning | Reacher 1.5 gravity MuJoCo | Reward-9.5 | 7 | |
| Reinforcement Learning | Reacher 0.5 gravity (test) | Average Return-7.1 | 7 | |
| Continuous Control | Reacher v1 (train) | Max Avg Return-3.6 | 7 | |
| Continuous Control (Negative Reward) | Reacher Pybullet | Mean Return16.8 | 6 | |
| Continuous Control (Positive Reward) | Reacher Pybullet | Mean Return18.7 | 6 | |
| Continuous Control (Negative Reward) | Reacher Mujoco | Mean Return-6.3 | 6 | |
| Continuous Control | fixedReacher | Average Reward0.849 | 6 | |
| Reinforcement Learning | Reacher | Maximum Return6.48 | 5 | |
| Continuous Robotic Control | Reacher normal v2 (test) | Final Performance-2.39 | 5 | |
| Extrapolative Generalization | Reacher-hard Unseen Goal | Mean Reward967.64 | 5 |