Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tutor Robustness Evaluation on Manually Defined Prompts

0Student Leakage

MathDial-SFT

-0.001-0.000500.0005Apr 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
0-675.94
2026.04
0-642.31
2026.04
0-757.84