| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning Control | Pendulum v1 | Mean Score1,378.78 | 40 | |
| Reinforcement Learning | Pendulum | Avg Episode Reward-145.49 | 26 | |
| Reinforcement Learning | Pendulum v1 (test) | Average Return-164.82 | 16 | |
| Regression | Pendulum (test) | MSE0.0034 | 14 | |
| Rollout prediction | Pendulum | Rollout MSE1.05 | 12 | |
| Continuous Control | Pendulum | Median Samples5.6 | 12 | |
| Continuous Control | Pendulum Nonmarkov v1 (test) | AUC@T-556.9 | 9 | |
| Control | Pendulum v0 | Median Samples21 | 9 | |
| Transition model estimation | Pendulum discretized n = 10^5 | Failure Rate0 | 8 | |
| Image Interpolation | Pendulum (test) | MSE1 | 8 | |
| Reinforcement Learning | Pendulum classical control (1M steps) | Return-133.42 | 8 | |
| Continuous Control | Pendulum v1 | Average Cumulative Reward-152.4 | 7 | |
| Robotic Control | Pendulum v1 | Local Optima Escape Rate89.2 | 7 | |
| Reinforcement Learning | Pendulum PD-C (test) | Cumulative Reward854 | 6 | |
| Continuous Control (Negative Reward) | Pendulum Pybullet | Mean Return9,124.6 | 6 | |
| Continuous Control (Positive Reward) | Pendulum Pybullet | Return9,043.3 | 6 | |
| Continuous Control (Negative Reward) | Pendulum Mujoco | Mean Return8,132.1 | 6 | |
| Continuous Control (Positive Reward) | Pendulum Mujoco | Return9,358.4 | 6 | |
| MCTS Aggregation Strategy Evaluation | Pendulum | MRR0.75 | 6 | |
| angular velocity decoding (prediction) | pendulum | R^20.727 | 6 | |
| angular velocity decoding (smoothing) | pendulum | R-squared99.7 | 6 | |
| Reinforcement Learning | Pendulum | Average Decisions1,000 | 6 | |
| Continuous Control | Pendulum | Action Repetition1.12 | 6 | |
| Property Prediction | Pendulum | Pendulum Angle1,555.33 | 6 | |
| Imitation Learning | Pendulum | Mean Score-179.6 | 6 |