Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Hacking Mitigation on Length Bias OA Length 1.0 (Evaluation)
Loading...
15
Dominance
IR3 Method C (Constrained)
13.92
21.21
28.5
35.79
Feb 23, 2026
Dominance
Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Dominance
Win Rate
IR3 Method C (Constrained)
type=Constrained
2026.02
15
68.5
IR3 Method B (Adversarial)
type=Adversarial
2026.02
16
67.2
IR3 Method A (Clean RL)
type=Clean RL
2026.02
21
63.5
IR3 Method D (Distillation)
type=Distillation
2026.02
24
71.8
Length Penalty
alpha=5e-4
2026.02
32
56.2
KL Regularization
beta=2e-2
2026.02
35
54.8
Reward Clipping
c=4
2026.02
36
53.8
PPO Clipping
epsilon=0.1
2026.02
38
52.5
PPO on R_proxy
2026.02
42
50
Feedback
Search any
task
Search any
task