Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MRBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Role-playing evaluationMRBench Chinese 1.0
Memory Adherence (SI)8.85
12
Role-playing evaluationMRBench English 1.0
MA-SI Score9.13
12
Memory-Driven Role-PlayMRBench
MA Score8.43
8
Pedagogical Dialogue ClassificationMRBench (test)
Mistake ID Acc91
7
Automated evaluation of tutor responsesMRBench extended (test)
Macro-F10.646
5
Showing 5 of 5 rows