| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CartPole *2 | gpt-oss:120b | Reward (First Iter, Worst)9.8 | 5 | 4d ago | |
| CartPole | gpt-oss:120b | Reward (First Iteration, Worst Rep)42.15 | 5 | 4d ago | |
| Pendulum Gymnasium | gpt-oss | Mean Best Reward-190.31 | 2 | 4d ago | |
| MountainCar Discrete Gymnasium | gpt-oss | Mean Best Reward-111.54 | 2 | 4d ago | |
| MountainCar Continuous Gymnasium | gpt-oss | Mean Best Reward94.81 | 2 | 4d ago | |
| Inverted Pendulum MuJoCo | qwen2.5 | Mean Best Reward829.66 | 2 | 4d ago | |
| Acrobot Gymnasium | gpt-oss | Mean Best Reward-77.3 | 2 | 4d ago |