| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | Pendulum v1 (test) | Average Return-164.82 | 16 | |
| Reinforcement Learning | Pendulum | Avg Episode Reward-145.49 | 15 | |
| Regression | Pendulum (test) | MSE0.0034 | 14 | |
| Continuous Control | Pendulum | Median Samples5.6 | 12 | |
| Continuous Control | Pendulum Nonmarkov v1 (test) | AUC@T-556.9 | 9 | |
| Control | Pendulum v0 | Median Samples21 | 9 | |
| Transition model estimation | Pendulum discretized n = 10^5 | Failure Rate0 | 8 | |
| Image Interpolation | Pendulum (test) | MSE1 | 8 | |
| Reinforcement Learning | Pendulum classical control (1M steps) | Return-133.42 | 8 | |
| Continuous Control (Negative Reward) | Pendulum Pybullet | Mean Return9,124.6 | 6 | |
| Continuous Control (Positive Reward) | Pendulum Pybullet | Return9,043.3 | 6 | |
| Continuous Control (Negative Reward) | Pendulum Mujoco | Mean Return8,132.1 | 6 | |
| Continuous Control (Positive Reward) | Pendulum Mujoco | Return9,358.4 | 6 | |
| MCTS Aggregation Strategy Evaluation | Pendulum | MRR0.75 | 6 | |
| angular velocity decoding (prediction) | pendulum | R^20.727 | 6 | |
| angular velocity decoding (smoothing) | pendulum | R-squared99.7 | 6 | |
| Reinforcement Learning | Pendulum | Average Decisions1,000 | 6 | |
| Continuous Control | Pendulum | Action Repetition1.12 | 6 | |
| Property Prediction | Pendulum | Pendulum Angle1,555.33 | 6 | |
| Imitation Learning | Pendulum | Mean Score-179.6 | 6 | |
| Forecasting | Pendulum | MSE0.283 | 5 | |
| Causal Representation Learning | Pendulum | MIC98.94 | 5 | |
| System Identification | Pendulum (test) | Average MSE0.72 | 5 | |
| Regression | Pendulum | MSE3.41 | 5 | |
| Offline Decision Making | Pendulum visual | Average Return-155.4 | 4 |