Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Clarifying Questions on SciQA (test)
Loading...
26
Accuracy
Swift
5.2
10.6
16
21.4
Jun 8, 2025
Accuracy
Thinking
R Overall
R Accuracy
R Reasoning
R Comprehensive
R Pedagogic
R Confidence
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Thinking
R Overall
R Accuracy
R Reasoning
R Comprehensive
R Pedagogic
R Confidence
Swift
Mode=Thinking Mode
2025.06
26
62
4.63
5.77
5.39
4.44
5.07
4.96
Refit
Mode=Thinking Mode
2025.06
24
68
5.03
6.21
5.81
4.75
5.42
5.33
STaR-GATE-D
Mode=Thinking Mode
2025.06
14
16
4.01
4.95
4.76
3.67
4.3
4.65
DPO
Mode=Thinking Mode
2025.06
8
24
4.44
5.53
5.17
4.19
4.86
4.86
StepDPO
Mode=Thinking Mode
2025.06
8
34
4.7
5.85
5.36
4.44
5.09
5.13
Base
Mode=Thinking Mode
2025.06
6
86
5.47
6.69
6.14
5.1
5.95
5.87
Feedback
Search any
task
Search any
task