Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MT_Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Utility EvaluationMT_Bench
Agreement Rate82.7
33
Multi-turn Dialogue EvaluationMT_Bench CodaSet OOD (test)
Performance (%)98.16
16
Showing 2 of 2 rows