| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | CartPole v0 | Mean Score200 | 48 | |
| Reinforcement Learning | CartPole Pure | Average Reward (2/0.5)200 | 30 | |
| Reinforcement Learning | CartPole | Average Reward1,000 | 29 | |
| Reinforcement Learning | CartPole v1 (test) | Total Reward500 | 25 | |
| Classic Discrete Control | CartPole v1 | Mean Episodic Return495 | 18 | |
| Reinforcement Learning | CartPole v1 | Return354,122 | 16 | |
| Imitation Learning | Cartpole v1 (test) | Optimality (%)100 | 15 | |
| Reinforcement Learning | CartPole | Wall-clock Training Time (min)0.1 | 13 | |
| Continuous Control | Cartpole | Median Samples1.63 | 10 | |
| Reinforcement Learning | CartPole Pure-8-40 | Average Reward (EpLen=2, DF=0.5)200 | 10 | |
| Reinforcement Learning | CartPole | Max Return500 | 9 | |
| Forecasting | CartPole Time-Varying (CP TV) (held-out trajectories) | MSE0.0007 | 8 | |
| Forecasting | CartPole Time-Invariant (held-out trajectories) | MSE2.28 | 8 | |
| Reinforcement Learning | CartPole Setting C v0 (test) | Performance (2/0.15)236.4 | 8 | |
| Reinforcement Learning | CartPole Setting B v0 (test) | Steps Survived (Config 0.15)192.4 | 8 | |
| Reinforcement Learning | CartPole Setting A v0 (test) | Performance Score (Config 2/0.85)98.2 | 8 | |
| Actuator Inversion | Cartpole (Ceval-in) | AER659 | 8 | |
| Actuator Inversion | Cartpole (train) | AER658 | 8 | |
| Control | Cartpole swing-up | Median Samples111 | 8 | |
| Safe Reinforcement Learning | Safe CartPole | Training Time (s)68.7 | 7 | |
| Reinforcement Learning | Safe CartPole | Episode Reward200 | 7 | |
| Offline Reinforcement Learning | CartPole 100k Gym | Returns100 | 6 | |
| Offline Reinforcement Learning | CartPole Gym (10k) | Returns100 | 6 | |
| Offline Reinforcement Learning | CartPole 1k Gym | Returns90 | 6 | |
| Reinforcement Learning | CartPole (CP) (test) | Cumulative Reward500 | 6 |