Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Gaming Mitigation on Code Gaming (held-out tests)
Loading...
4.2
Gaming Rate
SFT (NO RLHF)
1.916
17.333
32.75
48.167
Feb 2, 2026
Gaming Rate
PASS@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Gaming Rate
PASS@1
SFT (NO RLHF)
RLHF=None
2026.02
4.2
28.5
ARA
Framework=Auditor-gate...
2026.02
19.6
35.8
INFORM
Category=Regularizatio...
2026.02
39.8
33.1
ODIN
Category=Regularizatio...
2026.02
42.1
33.5
RM ENSEMBLE
Category=Reward Model...
2026.02
44.3
34
FILTERING (R > mu + 2sigma)
Category=Training Inte...
2026.02
46.7
32.4
PPO W/KL
Category=Regularizatio...
2026.02
48.5
31.8
PPO
Condition=Unmitigated
2026.02
61.3
34.2
Feedback
Search any
task
Search any
task