| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Dialogue Evaluation | MathDial (test) | Accuracy21 | 7 | |
| Tutor Leakage Evaluation | MathDial fine-tuned adversarial student setting (test) | Tutor Leakage8 | 3 | |
| Dialogue Quality Evaluation | MathDial | BF1 (qt, at)0.46 | 1 | |
| Mathematical Dialogue | MathDial | Metric- | 0 |