AITutor-EvalKit: Exploring the Capabilities of AI Tutors
About
We present AITutor-EvalKit, an application that uses language technology to evaluate the pedagogical quality of AI tutors, provides software for demonstration and evaluation, as well as model inspection and data visualization. This tool is aimed at education stakeholders as well as *ACL community at large, as it supports learning and can also be used to collect user feedback and annotation.
Numaan Naeem, Kaushal Kumar Maurya, Kseniia Petukhova, Ekaterina Kochmar• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automated evaluation of tutor responses | MRBench extended (test) | Macro-F10.601 | 5 | |
| Automated evaluation of tutor responses | Kochmar et al. 2025 (test) | Accuracy0.72 | 3 | |
| Automated evaluation of tutor responses | Kochmar 2025 (demonstration set) | Accuracy68 | 3 |
Showing 3 of 3 rows