Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MT-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-turn dialogueMT-Eval
LLM-EVAL Score8.16
20
Multi-turn Instruction FollowingMT-Eval
CSR93.62
20
Multi-turn dialogue evaluationMT-Eval
Expansion Score7.34
9
Multi-turn conversationMT-Eval
Accuracy8.28
9
Structured Reasoning and EvaluationMT-Eval
CSR95.56
8
Showing 5 of 5 rows