Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reinforcement Learning on Pendulum (Average Episode Reward)

-145.49Avg Episode Reward

TRPO

-5,954.4204-4,446.3327-2,938.245-1,430.1573Nov 2, 2023Mar 17, 2024Jul 31, 2024Dec 14, 2024Apr 29, 2025Sep 12, 2025Jan 26, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2023.11
-145.49
2023.11
-151.72
2023.11
-154.82
2023.11
-155.06
2023.11
-155.4
2023.11
-155.6
2023.11
-157.59
2023.11
-160.14
2026.01
-190
2023.11
-201.57
2026.01
-1,161
2023.11
-1,206.9
2026.01
-2,116
2026.01
-2,226
2026.01
-5,731