Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Conversation on MT-Bench (AVG Metric)
Loading...
85.25
Average Score
CoDIT-Qwen3-8B
61.3196
67.5323
73.745
79.9577
Apr 15, 2026
Average Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Average Score
CoDIT-Qwen3-8B
Student Model=Qwen3-8B...
2026.04
85.25
CoDIT-Qwen3-30B
Student Model=Qwen3-8B...
2026.04
84.18
CoDIT-Gemma3
Student Model=Qwen3-8B...
2026.04
83.13
WebR-Pro
Student Model=Qwen3-8B...
2026.04
82.72
Llama-3.1-LMSYS-Chat-1M-Synth
Student Model=Qwen3-8B...
2026.04
80.05
WebR-Basic
Student Model=Qwen3-8B...
2026.04
75.43
CoDIT-Qwen3-8B
Student Model=Llama-3....
2026.04
75.39
Gemma-2-LMSYS-Chat-1M-Synth
Student Model=Qwen3-8B...
2026.04
74.56
CoDIT-Gemma3
Student Model=Llama-3....
2026.04
74.33
Magpie-Pro-300K-Filtered
Student Model=Qwen3-8B...
2026.04
74.05
CoDIT-Qwen3-30B
Student Model=Llama-3....
2026.04
73.73
Llama-3.1-LMSYS-Chat-1M-Synth
Student Model=Llama-3....
2026.04
72.11
WildChat
Student Model=Qwen3-8B...
2026.04
70.04
WebR-Pro
Student Model=Llama-3....
2026.04
70
Gemma-2-LMSYS-Chat-1M-Synth
Student Model=Llama-3....
2026.04
69.51
WebR-Basic
Student Model=Llama-3....
2026.04
64.31
Magpie-Pro-300K-Filtered
Student Model=Llama-3....
2026.04
63.15
WildChat
Student Model=Llama-3....
2026.04
62.24
Feedback
Search any
task
Search any
task