Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
RLHF on HH-RLHF (held-out)
Loading...
1.59
Peak Gold Reward
DRRO-RLHF
0.498
0.7815
1.065
1.3485
Apr 30, 2026
Peak Gold Reward
Peak Proxy Reward
Peak Sequence-Level KL
Updated 1mo ago
Evaluation Results
Method
Method
Links
Peak Gold Reward
Peak Proxy Reward
Peak Sequence-Level KL
DRRO-RLHF
Type=soft + dynamic
2026.04
1.59
3.03
35.74
GRPO
2026.04
1.35
3.06
44.83
DRRO-RLHF
Type=hard
2026.04
1.29
2.87
32.33
BSPO
2026.04
1.21
3.02
42.61
PPO
2026.04
1.2
2.43
18.25
InfoRM
2026.04
1.14
3.06
44.25
Ensemble-UWO
2026.04
0.99
2.84
34.51
Ensemble-Mean
2026.04
0.94
2.71
32.05
DRO-RLHF
2026.04
0.54
1.72
16.54
Feedback
Search any
task
Search any
task