Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH 500 (Accuracy, TNFT, Tokens, Delay)
Loading...
64
Accuracy
Ours
47.568
51.834
56.1
60.366
May 12, 2026
Accuracy
TNFT (s)
Tokens Generated
Inference Delay (ms)
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
TNFT (s)
Tokens Generated
Inference Delay (ms)
Ours
Backbone=Qwen3-4B
2026.05
64
0
742.26
26.51
Vanilla
Backbone=Qwen3-4B
2026.05
60.8
103.16
1,363.95
45.82
Base
Backbone=Qwen3-4B
2026.05
60
130.1
3,678.51
126.9
Ours
Backbone=Qwen3-1.7B
2026.05
51.6
0
803
22.94
Base
Backbone=Qwen3-1.7B
2026.05
48.4
130.1
3,229.16
88.15
Vanilla
Backbone=Qwen3-1.7B
2026.05
48.2
103.16
1,612.76
43.14
Feedback
Search any
task
Search any
task