Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InvertedPendulum

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reinforcement LearningInvertedPendulum v2
Mean Reward1,000
27
Continuous ControlInvertedPendulum v5
Average Episodic Reward1,000
8
Imitation Learning from ObservationInvertedPendulum v4
AER5.7
8
Continuous ControlInvertedPendulum v2
Average Return1,000
7
Continuous ControlInvertedPendulum v1 (train)
Max Average Return1,000
7
Reinforcement LearningInvertedPendulum v4
Average Episodic Reward1,000
4
Showing 6 of 6 rows