Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-turn Dialogue Evaluation on MT_Bench CodaSet OOD (test)

98.16Performance (%)

Qwen3-235B

92.138493.701795.26596.8283May 25, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.05
98.163.1
98.162.9
2026.05
98.162.7
97.245.3
2026.05
97.244.7
95.532.4
95.531.2
95.263.7
2026.05
952.4
2026.05
94.743.2
94.477.9
2026.05
94.341.7
2026.05
94.212
2026.05
93.951.4
93.681.2
92.371