Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tutor Robustness Evaluation on Multi-Agent Student
Loading...
11
Student Leakage
MathDial-SFT
10.44
14.22
18
21.78
Apr 20, 2026
Student Leakage
Student Turns
Tutor Leakage
Tutor Turns
Updated 1mo ago
Evaluation Results
Method
Method
Links
Student Leakage
Student Turns
Tutor Leakage
Tutor Turns
MathDial-SFT
Tutor Setting=Base in-...
2026.04
11
5.73
41
5.96
SocraticLM
Tutor Setting=Base in-...
2026.04
12
5.65
50
4.04
TutorRL-7B
Tutor Setting=Base in-...
2026.04
25
5.72
53
8.43
Feedback
Search any
task
Search any
task