Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Estimation on PE-FD=8 (test)
Loading...
61.28
Average Outcome Reward
AREW– AS + BT
0.3256
16.1503
31.975
47.7997
Mar 12, 2026
Average Outcome Reward
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Outcome Reward
AREW– AS + BT
Training Protocol=PPO-...
2026.03
61.28
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
55.61
VANILLA
Training Protocol=PPO-...
2026.03
55.21
AREW– AS + BT
Training Protocol=PPO-...
2026.03
47.89
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
39.62
VANILLA
Training Protocol=PPO-...
2026.03
30.52
O4-MINI
Training Protocol=DIRE...
2026.03
8.42
LLAMA-3.1-8B-INST.
Training Protocol=DIRE...
2026.03
3.14
QWEN-2.5-7B-INST.
Training Protocol=DIRE...
2026.03
2.67
Feedback
Search any
task
Search any
task