Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (Mean@32)
Loading...
89.58
Mean@32
FlashMLA
-2.824
21.1655
45.155
69.1445
Aug 25, 2025
Sep 22, 2025
Oct 20, 2025
Nov 18, 2025
Dec 16, 2025
Jan 13, 2026
Feb 11, 2026
Mean@32
Updated 5d ago
Evaluation Results
Method
Method
Links
Mean@32
FlashMLA
Backbone=LongCat-Flash...
2026.02
89.58
SnapMLA
Backbone=LongCat-Flash...
2026.02
88.44
FlashMLA
Backbone=DeepSeek-V3.1...
2026.02
87.92
SnapMLA
Backbone=DeepSeek-V3.1...
2026.02
85.42
SFT
Backbone=Qwen2.5-7B-In...
2025.08
23.02
PSFTwarm-up
Backbone=Qwen2.5-7B-In...
2025.08
23.02
PSFT
Backbone=Qwen2.5-7B-In...
2025.08
21.98
SFT-KL
Backbone=Qwen2.5-7B-In...
2025.08
21.56
PSFTwarm-up
Backbone=Llama3.1-8B-I...
2025.08
18.75
SFT
Backbone=Llama3.1-8B-I...
2025.08
16.77
SFT-KL
Backbone=Llama3.1-8B-I...
2025.08
16.15
PSFT
Backbone=Llama3.1-8B-I...
2025.08
14.48
Base
Backbone=Qwen2.5-7B-In...
2025.08
8.75
Base
Backbone=Llama3.1-8B-I...
2025.08
0.73
Feedback
Search any
task
Search any
task