Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Drop in Utility)
Loading...
36.7
Drop in Utility
KL
-14.676
-1.338
12
25.338
Apr 1, 2026
Drop in Utility
Updated 5d ago
Evaluation Results
Method
Method
Links
Drop in Utility
KL
Alignment Strategy=Del...
2026.04
36.7
Latent Similarity- Layer 12
Alignment Strategy=Del...
2026.04
5.3
Random
Alignment Strategy=Del...
2026.04
1.9
Perplexity
Alignment Strategy=Del...
2026.04
-6.2
Self Certainty
Alignment Strategy=Del...
2026.04
-8.8
Latent Similarity- Best Layer
Alignment Strategy=Del...
2026.04
-12.7
Feedback
Search any
task
Search any
task