Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multistep Reasoning on SpokenMQA
Loading...
81.5
Accuracy
Baseline (Thinking)
55.812
62.481
69.15
75.819
Dec 11, 2025
Accuracy
Delay
TTFT
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Delay
TTFT
Baseline (Thinking)
Model Size=4B, Backbon...
2025.12
81.5
227.16
227.15
AsyncReasoning
Model Size=4B, Backbon...
2025.12
81
0.84
0.8
Baseline (No think)
Model Size=4B, Backbon...
2025.12
80.6
0.8
0.7
Baseline (Thinking)
Model Size=0.6B, Backb...
2025.12
70.4
201.53
201.52
AsyncReasoning
Model Size=0.6B, Backb...
2025.12
65.7
2.56
1.37
Baseline (No think)
Model Size=0.6B, Backb...
2025.12
56.8
0.72
0.67
Feedback
Search any
task
Search any
task