| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning | InvertedDoublePendulum v3 | Average Final Return9,360 | 7 | |
| Continuous Control | InvertedDoublePendulum v1 (train) | Max Average Return9,355.52 | 7 | |
| Reinforcement Learning | InvertedDoublePendulum v4 | Average Episodic Reward9,167.5 | 4 | |
| Continuous Control | InvertedDoublePendulum v5 | Average Episodic Reward9,349.2 | 2 |