Share your thoughts, 1 month free Claude Pro on usSee more

Multi-turn Dialogue Evaluation on MT-Bench (MT-Bench Score, ŝu, ŝw)

8.99MT-Bench Score

GPT-4

Updated 4mo ago

Evaluation Results

Method	Links
GPT-4 2026.01		8.99	0.83	0.73
GPT-3.5 2026.01		7.94	0.51	0.43
Vicuna-13B (all) 2026.01		6.39	-0.29	-0.25
Alpaca-13B 2026.01		4.53	-0.62	-0.54
LLaMA-13B 2026.01		2.61	-1.27	-1.12