Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reinforcement Learning on FailureBench Bounded Push
Loading...
4,593.96
Average Return
FARL
-84.6152
1,130.0149
2,344.645
3,559.2751
Jan 12, 2026
Average Return
Updated 3mo ago
Evaluation Results
Method
Method
Links
Average Return
FARL
mode=fine-tuning, init...
2026.01
4,593.96
PPO-Lag
mode=fine-tuning, init...
2026.01
420.79
CPO
mode=fine-tuning, init...
2026.01
156.28
P3O
mode=fine-tuning, init...
2026.01
95.33
Feedback
Search any
task
Search any
task