Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-turn Conversation Evaluation on UltraFeedback

6.1MT-Bench Score

COALA

0.7962.1733.554.927May 22, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.05
6.1
2026.05
6
2026.05
5.5
2026.05
5.4
2026.05
5
2026.05
4.4
2026.05
3.9
2026.05
3.6
2026.05
3.5
2026.05
2.7
2026.05
1.9
2026.05
1.7
2026.05
1.6
2026.05
1.4
2026.05
1.4
2026.05
1.2
2026.05
1.2
2026.05
1.2
2026.05
1.1
2026.05
1