| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Acrobot v1 | Mean Return89.37 | 42 | |
| Imitation Learning | Acrobot v1 (test) | Optimality (%)103.67 | 15 | |
| Continuous-state and discrete-action control | Acrobot v1 | Final Average Reward495.4 | 13 | |
| Reinforcement Learning | Acrobot | Training Time (min)1.9 | 13 | |
| Control | Acrobot | Robustness Gap0.32 | 12 | |
| Reinforcement Learning | Acrobot | Average Returns-82.5 | 10 | |
| Continuous Control | Acrobot Nonmarkov v1 (test) | AUC@T-82.7 | 9 | |
| Reinforcement Learning | Acrobot (AB) (test) | Avg Undiscounted Reward80 | 6 | |
| Identifying Optimal Trajectories | Acrobot v1 (top-5 trajectories) | Average Trajectory Length73.2 | 6 | |
| Classic Discrete Control | Acrobot v1 | Mean Episodic Return94 | 5 | |
| Trajectory Optimization | Acrobot | Runtime (seconds)5.09 | 5 | |
| Acrobot Control | Acrobot | Success Rate100 | 4 | |
| Reinforcement Learning | Acrobot standard (train/test) | Model Parameters850 | 4 | |
| Sensory-motor control | Acrobot Gymnasium | Mean Best Reward-77.3 | 2 | |
| Reinforcement Learning | acrobot Sticky | AUC@T-84,528,355.1 | 2 | |
| Reinforcement Learning | acrobot Noisy | AUC @ T-80,111,477.55 | 2 | |
| Reinforcement Learning | acrobot Clean | AUC@T-79,144,263.52 | 2 | |
| Continuous Control | Acrobot Nonmarkov v1 | AUC@T- | 0 |