Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-turn Conversation Evaluation on MT-Bench 1.0 (test)

8.538GPT-4 Score

DAR

8.325848.380928.4368.49108Feb 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
8.5387.931
2026.02
8.4257.856
2026.02
8.4097.893
2026.02
8.3787.838
2026.02
8.3347.769