Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Automated evaluation of tutor responses on MRBench extended (test)

0.646Macro-F1

BJTU

0.59920.611350.62350.63565Dec 3, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.12
0.646
2025.12
0.643
2025.12
0.632
2025.12
0.625
2025.12
0.601