Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME24 (P@4, Avg Output Length)
Loading...
73.3
P@4
MPD
62.9
65.6
68.3
71
May 9, 2026
P@4
Avg Output Length (k tokens)
Updated 22d ago
Evaluation Results
Method
Method
Links
P@4
Avg Output Length (k tokens)
MPD
2026.05
73.3
13.4
Direct Comp.
2026.05
70
14.5
CRISP
2026.05
70
14.8
LiteCoT
2026.05
66.7
15.6
Vanilla LLM
2026.05
63.3
16.8
Chain-of-Draft
2026.05
63.3
11.8
Feedback
Search any
task
Search any
task