Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-turn Dialogue Evaluation on MT-Bench-101

93.62Normalized Score

Qwen3-Max-Thinking

85.14487.344589.54591.7455Mar 24, 2026
Updated 24d ago

Evaluation Results

MethodLinks
2026.03
93.623
2026.03
91.173
2026.03
90.773
2026.03
89.983
2026.03
85.473