Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Knowledge Evaluation on MMLU-Pro
Loading...
9.1
Accuracy Improvement
Ultra-Dense Prompting
6.916
7.483
8.05
8.617
Apr 19, 2026
Accuracy Improvement
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy Improvement
Ultra-Dense Prompting
Model=Gemini 2.0, Prom...
2026.04
9.1
Ultra-Dense Prompting
Model=Claude 3.7, Prom...
2026.04
8.2
Ultra-Dense Prompting
Model=Average across M...
2026.04
8
Ultra-Dense Prompting
Model=GPT-4o, Prompt S...
2026.04
7.9
Ultra-Dense Prompting
Model=GPT-4o-mini, Pro...
2026.04
7
Feedback
Search any
task
Search any
task