Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Estimation on PE-GS=2 (test)
Loading...
54.67
Average Outcome Reward
AREW– AS + BT
13.4132
24.1241
34.835
45.5459
Mar 12, 2026
Average Outcome Reward
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Outcome Reward
AREW– AS + BT
Training Protocol=PPO-...
2026.03
54.67
AREW– AS + BT
Training Protocol=PPO-...
2026.03
49.33
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
49
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
46
VANILLA
Training Protocol=PPO-...
2026.03
27.33
VANILLA
Training Protocol=PPO-...
2026.03
24
LLAMA-3.1-8B-INST.
Training Protocol=DIRE...
2026.03
18
O4-MINI
Training Protocol=DIRE...
2026.03
17.11
QWEN-2.5-7B-INST.
Training Protocol=DIRE...
2026.03
15
Feedback
Search any
task
Search any
task