Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (test) (Accuracy, Token Count, Latency)
Loading...
94.7
Accuracy
VTC-R1
85.756
88.078
90.4
92.722
Jan 29, 2026
Accuracy
Token Count
Latency
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Token Count
Latency
VTC-R1
Backbone=Qwen3-VL-8B
2026.01
94.7
1.09
0.46
VTC-R1
Backbone=Glyph
2026.01
93.6
1.09
0.34
SFT
Backbone=Qwen3-VL-8B
2026.01
88.1
1.79
3.04
SFT
Backbone=Glyph
2026.01
87.1
1.87
0.93
TokenSkip
Backbone=Glyph
2026.01
86.4
2.25
1.32
Base SFT
Backbone=Glyph
2026.01
86.1
2.35
1.38
Feedback
Search any
task
Search any
task