Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reinforcement Learning on FailureBench Obstructed Push
Loading...
1,227.18
Average Return
FARL
0.2088
318.7494
637.29
955.8306
Jan 12, 2026
Average Return
Updated 3mo ago
Evaluation Results
Method
Method
Links
Average Return
FARL
mode=fine-tuning, init...
2026.01
1,227.18
PPO-Lag
mode=fine-tuning, init...
2026.01
543.95
P3O
mode=fine-tuning, init...
2026.01
455.86
CPO
mode=fine-tuning, init...
2026.01
47.4
Feedback
Search any
task
Search any
task