Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Estimation on PE-FD=6 (test)
Loading...
56.91
Average Outcome Reward
AREW– AS ONLY
2.0916
16.3233
30.555
44.7867
Mar 12, 2026
Average Outcome Reward
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Outcome Reward
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
56.91
AREW– AS + BT
Training Protocol=PPO-...
2026.03
54.65
AREW– AS + BT
Training Protocol=PPO-...
2026.03
44.47
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
42.1
VANILLA
Training Protocol=PPO-...
2026.03
32.03
O4-MINI
Training Protocol=DIRE...
2026.03
12.47
VANILLA
Training Protocol=PPO-...
2026.03
6
LLAMA-3.1-8B-INST.
Training Protocol=DIRE...
2026.03
5.7
QWEN-2.5-7B-INST.
Training Protocol=DIRE...
2026.03
4.2
Feedback
Search any
task
Search any
task