Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME24 (Full Len., Core Len., CR, RM, Top-3 Mass, Retention)
Loading...
11.4
Full Length Score
Minimal-core extraction
10.152
10.476
10.8
11.124
May 14, 2026
Full Length Score
Core Length Score
CR
RM
Top-3 Mass
Retention
Updated 19d ago
Evaluation Results
Method
Method
Links
Full Length Score
Core Length Score
CR
RM
Top-3 Mass
Retention
Minimal-core extraction
Model=GPT-5
2026.05
11.4
6
53
47
64
85
Minimal-core extraction
Model=DeepSeek-R1-Dist...
2026.05
10.9
6.3
58
42
59
83
Minimal-core extraction
Model=Qwen3-32B
2026.05
10.5
6.5
62
38
56
81
Minimal-core extraction
Model=DeepSeek-R1-Dist...
2026.05
10.2
6.7
66
34
53
77
Feedback
Search any
task
Search any
task