Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MathDial

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical Dialogue EvaluationMathDial (test)
Accuracy21
7
Tutor Leakage EvaluationMathDial fine-tuned adversarial student setting (test)
Tutor Leakage8
3
Dialogue Quality EvaluationMathDial
BF1 (qt, at)0.46
1
Mathematical DialogueMathDial
Metric-
0
Showing 4 of 4 rows