Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-turn Dialogue Evaluation on MT-Bench (MT-Bench Score, ŝu, ŝw)

8.99MT-Bench Score

GPT-4

2.35484.07745.87.5226Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
8.990.830.73
2026.01
7.940.510.43
2026.01
6.39-0.29-0.25
2026.01
4.53-0.62-0.54
2026.01
2.61-1.27-1.12