Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (P@4, #Tok)
Loading...
56.7
P@4 Accuracy
MPD
46.3
49
51.7
54.4
May 9, 2026
P@4 Accuracy
Token Count
Updated 22d ago
Evaluation Results
Method
Method
Links
P@4 Accuracy
Token Count
MPD
2026.05
56.7
12,900
CRISP
2026.05
56.6
16,300
Vanilla LLM
2026.05
53.3
17,700
Direct Comp.
2026.05
50
15,200
Chain-of-Draft
2026.05
50
13,000
LiteCoT
2026.05
46.7
14,600
Feedback
Search any
task
Search any
task