Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Drop in Utility (%))
Loading...
-7.8
Drop in Utility
Latent Similarity
-9.304
0.848
11
21.152
Apr 1, 2026
Drop in Utility
Updated 5d ago
Evaluation Results
Method
Method
Links
Drop in Utility
Latent Similarity
Selection Criterion=La...
2026.04
-7.8
Self Certainty
Selection Criterion=Se...
2026.04
-6
Perplexity
Selection Criterion=Pe...
2026.04
-5.1
Random
Selection Criterion=Ra...
2026.04
1.6
Latent Similarity
Selection Criterion=La...
2026.04
7.6
KL
Selection Criterion=KL
2026.04
29.8
Feedback
Search any
task
Search any
task