Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Hacking on LeetCode with write access (test)
Loading...
1
Hack Rate
Gen.-time suppression
-2.956
23.747
50.45
77.153
Apr 1, 2026
Hack Rate
Updated 16d ago
Evaluation Results
Method
Method
Links
Hack Rate
Gen.-time suppression
Model=Llama-3.2-3B, Wr...
2026.04
1
Adv. Mod. (additive)
Model=Llama-3.2-3B, Al...
2026.04
1.2
Adv. Mod. (multiplicative)
Model=Llama-3.2-3B, Al...
2026.04
1.8
No intervention
Model=Llama-3.2-3B, Wr...
2026.04
1.9
No intervention
Model=Phi-4-mini, Writ...
2026.04
2.5
Adv. Mod. (multiplicative)
Model=Phi-4-mini, Alph...
2026.04
2.7
Adv. Mod. (additive)
Model=Phi-4-mini, Alph...
2026.04
4.9
Gen.-time suppression
Model=Phi-4-mini, Writ...
2026.04
5.3
Adv. Mod. (multiplicative)
Model=Llama-3.2-3B, al...
2026.04
15.1
Adv. Mod. (multiplicative)
Model=Phi-4-mini, alph...
2026.04
24.9
Adv. Mod. (additive)
Model=Llama-3.2-3B, al...
2026.04
47.8
Gen.-time suppression
Model=Llama-3.2-3B
2026.04
53.4
Adv. Mod. (additive)
Model=Phi-4-mini, alph...
2026.04
64.8
Gen.-time suppression
Model=Phi-4-mini
2026.04
72.3
No intervention
Model=Llama-3.2-3B
2026.04
78.9
No intervention
Model=Phi-4-mini
2026.04
99.9
Feedback
Search any
task
Search any
task