Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Conversation Evaluation on EduFeedback alternate
Loading...
8.3
MT-Bench Score
SFT
0.708
2.679
4.65
6.621
May 22, 2026
MT-Bench Score
Updated 9d ago
Evaluation Results
Method
Method
Links
MT-Bench Score
SFT
Base Model=DOLPHIN-7B
2026.05
8.3
DPO
Base Model=DOLPHIN-7B
2026.05
8.3
SFT
Base Model=MISTRAL-7B
2026.05
8.1
COALA
Base Model=DOLPHIN-7B
2026.05
8.1
COALA
Base Model=LLAMA-8B
2026.05
8
SFT
Base Model=LLAMA-8B
2026.05
7.9
DPO
Base Model=LLAMA-8B
2026.05
7.9
ORPO
Base Model=DOLPHIN-7B
2026.05
7.9
ORPO
Base Model=LLAMA-8B
2026.05
7.8
COALA
Base Model=MISTRAL-7B
2026.05
6.9
ORPO
Base Model=MISTRAL-7B
2026.05
6.8
DPO
Base Model=MISTRAL-7B
2026.05
6.1
SFT
Base Model=GPT-2
2026.05
1.7
COALA
Base Model=DISTILGPT
2026.05
1.7
SFT
Base Model=DISTILGPT
2026.05
1.6
ORPO
Base Model=GPT-2
2026.05
1.6
COALA
Base Model=GPT-2
2026.05
1.6
DPO
Base Model=GPT-2
2026.05
1.3
DPO
Base Model=DISTILGPT
2026.05
1
ORPO
Base Model=DISTILGPT
2026.05
1
Feedback
Search any
task
Search any
task