Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InvertedDoublePendulum

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reinforcement LearningInvertedDoublePendulum v3
Average Final Return9,360
7
Continuous ControlInvertedDoublePendulum v1 (train)
Max Average Return9,355.52
7
Reinforcement LearningInvertedDoublePendulum v4
Average Episodic Reward9,167.5
4
Continuous ControlInvertedDoublePendulum v5
Average Episodic Reward9,349.2
2
Showing 4 of 4 rows