Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Accuracy Improvement)
Loading...
5.7
Accuracy Improvement (GSM8K)
Ultra-Dense Prompting
4.14
4.545
4.95
5.355
Apr 19, 2026
Accuracy Improvement (GSM8K)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy Improvement (GSM8K)
Ultra-Dense Prompting
Model=Gemini 2.0
2026.04
5.7
Ultra-Dense Prompting
Model=Claude 3.7
2026.04
5.1
Ultra-Dense Prompting
Model=Average across M...
2026.04
4.9
Ultra-Dense Prompting
Model=GPT-4o
2026.04
4.8
Ultra-Dense Prompting
Model=GPT-4o-mini
2026.04
4.2
Feedback
Search any
task
Search any
task