Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (held-out)
Loading...
66.7
DBS
DataChef-32B
11.164
25.582
40
54.418
Feb 11, 2026
DBS
DVSavg@32
Updated 4d ago
Evaluation Results
Method
Method
Links
DBS
DVSavg@32
DataChef-32B
Oracle Upper Bound=tru...
2026.02
66.7
-
SOURCEbest
2026.02
39.6
-
EXPERT
Model=Qwen3-1.7B optim...
2026.02
33.3
-
Gemini-3-Pro
2026.02
30
80.7
DataChef-32B
2026.02
30
84.7
SOURCEavg
2026.02
23.4
-
Qwen3-Next ⊕ Kimi-K2
Reasoning backbone=Qwe...
2026.02
23.3
78.3
Kimi-K2
2026.02
20
35.4
Qwen3-32B
2026.02
13.3
31.5
Feedback
Search any
task
Search any
task