Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Estimation on PE GS=3 (test)
Loading...
80.33
Average Outcome Reward
AREW– AS + BT
7.1868
26.1759
45.165
64.1541
Mar 12, 2026
Average Outcome Reward
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Outcome Reward
AREW– AS + BT
Training Protocol=PPO-...
2026.03
80.33
AREW– AS + BT
Training Protocol=PPO-...
2026.03
77.67
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
73
AREW– AS ONLY
Training Protocol=PPO-...
2026.03
32
O4-MINI
Training Protocol=DIRE...
2026.03
21.15
VANILLA
Training Protocol=PPO-...
2026.03
18.33
LLAMA-3.1-8B-INST.
Training Protocol=DIRE...
2026.03
12.33
VANILLA
Training Protocol=PPO-...
2026.03
11
QWEN-2.5-7B-INST.
Training Protocol=DIRE...
2026.03
10
Feedback
Search any
task
Search any
task