| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Imitation Learning from Observation | InvertedPendulum v4 | AER5.7 | 8 | |
| Reinforcement Learning | InvertedPendulum v2 | Mean Reward1,000 | 8 | |
| Continuous Control | InvertedPendulum v2 | Average Return1,000 | 7 | |
| Continuous Control | InvertedPendulum v1 (train) | Max Average Return1,000 | 7 |