| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Policy Evaluation | Cart-Pole off-policy perfect features | MSE0.17 | 7 | |
| Policy Evaluation | Cart-Pole on-policy, perfect features | MSE0.15 | 7 | |
| Policy Evaluation | Cart-Pole off-policy, impoverished features | MSE2.33 | 7 | |
| Policy Evaluation | Cart-Pole on-policy, impoverished features | MSE2.37 | 7 | |
| Reinforcement Learning | Cart-Pole Domain Generalization - Pole Length OpenAI Gym (3 held out domains) | Average Reward175.25 | 6 | |
| Reinforcement Learning | Cart-Pole OpenAI Gym (3 held-out domains (variable pole length and cart mass)) | Return170.81 | 6 | |
| Classic Control | Cart Pole OpenAI Gym (evaluation) | Mean Score178.3 | 5 | |
| Optimal Control | Cart Pole | Final Cost17.49 | 2 | |
| End-to-end learning and planning | Cart-Pole Swing-Up | Cost (Best)87.78 | 1 | |
| Q-only open-loop forecasting | Cart-Pole windy | Metric- | 0 |