Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Dialogue Evaluation on MT-Bench-101
Loading...
93.62
Normalized Score
Qwen3-Max-Thinking
85.144
87.3445
89.545
91.7455
Mar 24, 2026
Normalized Score
Discriminability
Updated 24d ago
Evaluation Results
Method
Method
Links
Normalized Score
Discriminability
Qwen3-Max-Thinking
formatting=multi-turn,...
2026.03
93.62
3
DeepSeek-V3.2
formatting=multi-turn,...
2026.03
91.17
3
Kimi-K2.5
formatting=multi-turn,...
2026.03
90.77
3
MiniMax-M2.5
formatting=multi-turn,...
2026.03
89.98
3
GLM-5
formatting=multi-turn,...
2026.03
85.47
3
Feedback
Search any
task
Search any
task