Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reinforcement Learning on Gym-MuJoCo Walker2D
Loading...
4,909
Average Return
SiMPO-Linear
1,749.48
2,569.74
3,390
4,210.26
Mar 10, 2026
Average Return
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Return
SiMPO-Linear
Training Steps=1M
2026.03
4,909
SiMPO-Lin. Neg.
Training Steps=1M
2026.03
4,906
SAC
Training Steps=1M
2026.03
4,625
SiMPO-Exp
Training Steps=1M
2026.03
4,616
SiMPO-Square
Training Steps=1M
2026.03
4,478
QSM
Training Steps=1M
2026.03
3,933
DIPO
Training Steps=1M
2026.03
3,809
TD3
Training Steps=1M
2026.03
3,732
QVPO
Training Steps=1M
2026.03
2,866
DACER
Training Steps=1M
2026.03
1,871
Feedback
Search any
task
Search any
task