Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Conversation Evaluation on IMDb
Loading...
5.6
MT-Bench Score
COALA
0.816
2.058
3.3
4.542
May 22, 2026
MT-Bench Score
Updated 9d ago
Evaluation Results
Method
Method
Links
MT-Bench Score
COALA
Base Model=DOLPHIN-7B
2026.05
5.6
ORPO
Base Model=DOLPHIN-7B
2026.05
5.5
DPO
Base Model=DOLPHIN-7B
2026.05
5.1
SFT
Base Model=LLAMA-8B
2026.05
3.5
COALA
Base Model=LLAMA-8B
2026.05
3.5
SFT
Base Model=DOLPHIN-7B
2026.05
3.2
DPO
Base Model=LLAMA-8B
2026.05
3.1
ORPO
Base Model=LLAMA-8B
2026.05
2.9
COALA
Base Model=MISTRAL-7B
2026.05
2.9
DPO
Base Model=MISTRAL-7B
2026.05
2.8
SFT
Base Model=MISTRAL-7B
2026.05
2.3
ORPO
Base Model=MISTRAL-7B
2026.05
1.7
DPO
Base Model=DISTILGPT
2026.05
1.6
SFT
Base Model=GPT-2
2026.05
1.4
SFT
Base Model=DISTILGPT
2026.05
1.2
ORPO
Base Model=DISTILGPT
2026.05
1.2
COALA
Base Model=DISTILGPT
2026.05
1.2
ORPO
Base Model=GPT-2
2026.05
1.1
COALA
Base Model=GPT-2
2026.05
1.1
DPO
Base Model=GPT-2
2026.05
1
Feedback
Search any
task
Search any
task