Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Math Reasoning on MultiArith (Accuracy and Reasoning Length)
Loading...
98.3
Accuracy
SFT-CoT
16.036
37.393
58.75
80.107
Jan 21, 2026
Jan 22, 2026
Jan 24, 2026
Jan 25, 2026
Jan 27, 2026
Jan 28, 2026
Jan 30, 2026
Accuracy
Reasoning Length
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Reasoning Length
SFT-CoT
Model=Qwen3-VL-4B-Inst...
2026.01
98.3
59.1
RoT (Ours)
Model=Qwen3-VL-4B-Inst...
2026.01
97.2
32
SFT-CoT
Model=Qwen3-VL-2B-Inst...
2026.01
95
68
ReGuLaR
Backbone=LLaMA-3.2-1B-...
2026.01
89.2
2.28
CoLaR
Backbone=LLaMA-3.2-1B-...
2026.01
87
3.23
SFT-CoT
Model=LLaVa-V1.6-Mistr...
2026.01
86.1
82.3
SFT-w/o CoT
Model=Qwen3-VL-4B-Inst...
2026.01
85.6
0
RoT (Ours)
Model=LLaVa-V1.6-Mistr...
2026.01
68.3
32
RoT (Ours)
Model=Qwen3-VL-2B-Inst...
2026.01
62.2
32
SFT-w/o CoT
Model=LLaVa-V1.6-Mistr...
2026.01
52.8
0
SFT-w/o CoT
Model=Qwen3-VL-2B-Inst...
2026.01
41.7
0
Coconut
Backbone=LLaMA-3.2-1B-...
2026.01
41.4
6
iCoT
Backbone=LLaMA-3.2-1B-...
2026.01
38.2
0
CODI
Backbone=LLaMA-3.2-1B-...
2026.01
19.2
6
Feedback
Search any
task
Search any
task