Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Conversation Evaluation on UltraFeedback
Loading...
6.1
MT-Bench Score
COALA
0.796
2.173
3.55
4.927
May 22, 2026
MT-Bench Score
Updated 9d ago
Evaluation Results
Method
Method
Links
MT-Bench Score
COALA
Base Model=LLAMA-8B
2026.05
6.1
ORPO
Base Model=LLAMA-8B
2026.05
6
COALA
Base Model=DOLPHIN-7B
2026.05
5.5
ORPO
Base Model=DOLPHIN-7B
2026.05
5.4
SFT
Base Model=LLAMA-8B
2026.05
5
DPO
Base Model=DOLPHIN-7B
2026.05
4.4
DPO
Base Model=LLAMA-8B
2026.05
3.9
SFT
Base Model=MISTRAL-7B
2026.05
3.6
COALA
Base Model=MISTRAL-7B
2026.05
3.5
SFT
Base Model=DOLPHIN-7B
2026.05
2.7
DPO
Base Model=MISTRAL-7B
2026.05
1.9
SFT
Base Model=GPT-2
2026.05
1.7
SFT
Base Model=DISTILGPT
2026.05
1.6
DPO
Base Model=GPT-2
2026.05
1.4
ORPO
Base Model=DISTILGPT
2026.05
1.4
DPO
Base Model=DISTILGPT
2026.05
1.2
ORPO
Base Model=MISTRAL-7B
2026.05
1.2
COALA
Base Model=GPT-2
2026.05
1.2
ORPO
Base Model=GPT-2
2026.05
1.1
COALA
Base Model=DISTILGPT
2026.05
1
Feedback
Search any
task
Search any
task