Share your thoughts, 1 month free Claude Pro on usSee more

Multi-turn Dialogue Evaluation on MT-Bench-101

93.62Normalized Score

Qwen3-Max-Thinking

Updated 4mo ago

Evaluation Results

Method	Links
Qwen3-Max-Thinking 2026.03		93.62	3
DeepSeek-V3.2 2026.03		91.17	3
Kimi-K2.5 2026.03		90.77	3
MiniMax-M2.5 2026.03		89.98	3
GLM-5 2026.03		85.47	3