Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Math dataset
Loading...
85.7
Accuracy
Cloud-Only
33.908
47.354
60.8
74.246
May 29, 2026
Accuracy
Cloud Call Count
Average Latency (s)
Average Token Usage
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Cloud Call Count
Average Latency (s)
Average Token Usage
Cloud-Only
Cloud Model=Qwen3-235B...
2026.05
85.7
-
-
-
CSA-Based
Local Model=Qwen3-4B,...
2026.05
80
73
0.66
137
CSA-Based
Local Model=Qwen3-1.7B...
2026.05
73.3
96
1.01
320
Vanilla
Local Model=Qwen3-4B,...
2026.05
70.2
-
-
-
Local-Only
Local Model=Qwen3-4B
2026.05
69.4
-
-
-
Vanilla
Local Model=Qwen3-1.7B...
2026.05
64.1
-
-
-
Local-Only
Local Model=Qwen3-1.7B
2026.05
60
-
-
-
CSA-Based
Local Model=Qwen3-0.6B...
2026.05
57.3
156
0.19
79
Vanilla
Local Model=Qwen3-0.6B...
2026.05
39.6
-
-
-
Local-Only
Local Model=Qwen3-0.6B
2026.05
35.9
-
-
-
Feedback
Search any
task
Search any
task