Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Dialogue Evaluation on MathDial (test)
Loading...
21
Accuracy
DPO
-0.84
4.83
10.5
16.17
Jun 8, 2025
Accuracy
Thinking Rate
R Overall
R Accuracy
R Reasoning
R Comprehensive
R Pedagogic
R Confidence
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Thinking Rate
R Overall
R Accuracy
R Reasoning
R Comprehensive
R Pedagogic
R Confidence
DPO
Mode=Standard Mode
2025.06
21
0
1.85
1.9
2.41
2
1.77
1.58
StepDPO
Mode=Standard Mode
2025.06
20.67
0
1.86
1.87
2.43
2.03
1.78
1.55
STaR-GATE-D
Mode=Standard Mode
2025.06
20
0
1.55
1.63
1.95
1.79
1.51
1.31
STaR-GATE
Mode=Standard Mode
2025.06
10.67
0
2.29
2.25
2.94
2.58
2.3
1.56
Swift
Mode=Standard Mode
2025.06
9.67
0
2.43
2.41
3.09
2.68
2.45
1.66
Refit
Mode=Standard Mode
2025.06
6
0
2.43
2.5
3.15
2.68
2.41
1.63
Base
Mode=Standard Mode
2025.06
0
0
1.9
2.2
2.54
1.81
2.01
1.2
Feedback
Search any
task
Search any
task