Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reinforcement Learning on Pendulum (Average Episode Reward)
Loading...
-145.49
Avg Episode Reward
TRPO
-5,954.4204
-4,446.3327
-2,938.245
-1,430.1573
Nov 2, 2023
Mar 17, 2024
Jul 31, 2024
Dec 14, 2024
Apr 29, 2025
Sep 12, 2025
Jan 26, 2026
Avg Episode Reward
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg Episode Reward
TRPO
2023.11
-145.49
ESPL
2023.11
-151.72
SAC
2023.11
-154.82
TD3
2023.11
-155.06
DSP
2023.11
-155.4
DDPG
2023.11
-155.6
A2C
2023.11
-157.59
PPO
2023.11
-160.14
AC-CG
batch size=1000, seeds=5
2026.01
-190
ACKTR
2023.11
-201.57
AC-KFAC
batch size=1000, seeds=5
2026.01
-1,161
Regression
2023.11
-1,206.9
AC-SGD
batch size=1000, seeds=5
2026.01
-2,116
SMAC
batch size=1000, seeds=5
2026.01
-2,226
AC-Adam
batch size=1000, seeds=5
2026.01
-5,731
Feedback
Search any
task
Search any
task