| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | CartPole v0 | Mean Score200 | 48 | |
| Reinforcement Learning | CartPole Pure | Average Reward (2/0.5)200 | 30 | |
| Reinforcement Learning | CartPole v1 (test) | Total Reward500 | 25 | |
| Reinforcement Learning | CartPole | Average Reward1,000 | 20 | |
| Continuous Control | Cartpole | Median Samples1.63 | 10 | |
| Reinforcement Learning | CartPole Pure-8-40 | Average Reward (EpLen=2, DF=0.5)200 | 10 | |
| Reinforcement Learning | CartPole Setting C v0 (test) | Performance (2/0.15)236.4 | 8 | |
| Reinforcement Learning | CartPole Setting B v0 (test) | Steps Survived (Config 0.15)192.4 | 8 | |
| Reinforcement Learning | CartPole Setting A v0 (test) | Performance Score (Config 2/0.85)98.2 | 8 | |
| Actuator Inversion | Cartpole (Ceval-in) | AER659 | 8 | |
| Actuator Inversion | Cartpole (train) | AER658 | 8 | |
| Control | Cartpole swing-up | Median Samples111 | 8 | |
| Safe Reinforcement Learning | Safe CartPole | Training Time (s)68.7 | 7 | |
| Reinforcement Learning | Safe CartPole | Episode Reward200 | 7 | |
| Offline Reinforcement Learning | CartPole 100k Gym | Returns100 | 6 | |
| Offline Reinforcement Learning | CartPole Gym (10k) | Returns100 | 6 | |
| Offline Reinforcement Learning | CartPole 1k Gym | Returns90 | 6 | |
| Reinforcement Learning | CartPole (CP) (test) | Cumulative Reward500 | 6 | |
| Control Task | CartPole (test) | Average Reward494.09 | 6 | |
| Robustness Evaluation | CartPole A=9.5 (test) | Average Reward231.8 | 6 | |
| Robustness Evaluation | CartPole A=9.0 (test) | Average Reward830.9 | 6 | |
| Robustness Evaluation | CartPole A=8.5 (test) | Average Reward810.1 | 6 | |
| Robustness Evaluation | CartPole A=8.0 (test) | Average Reward1,000 | 6 | |
| Reinforcement Learning | CartPole | Max Return500 | 5 | |
| Sensory-motor control | CartPole *2 | Reward (First Iter, Worst)9.8 | 5 |