Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MT-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-turn dialogue evaluationMT-Eval
Expansion Score7.34
9
Multi-turn conversationMT-Eval
Accuracy8.28
9
Showing 2 of 2 rows