| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Role-playing evaluation | MRBench Chinese 1.0 | Memory Adherence (SI)8.85 | 12 | |
| Role-playing evaluation | MRBench English 1.0 | MA-SI Score9.13 | 12 | |
| Memory-Driven Role-Play | MRBench | MA Score8.43 | 8 | |
| Pedagogical Dialogue Classification | MRBench (test) | Mistake ID Acc91 | 7 | |
| Automated evaluation of tutor responses | MRBench extended (test) | Macro-F10.646 | 5 |