Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn Conversation Evaluation on MT-Bench 1.0 (test)
Loading...
8.538
GPT-4 Score
DAR
8.32584
8.38092
8.436
8.49108
Feb 12, 2026
GPT-4 Score
GPT-4-Turbo Score
Updated 4d ago
Evaluation Results
Method
Method
Links
GPT-4 Score
GPT-4-Turbo Score
DAR
Response Length=1358
2026.02
8.538
7.931
GRPO
Response Length=1559
2026.02
8.425
7.856
RLOO
Response Length=1580
2026.02
8.409
7.893
Iter-SFT
Response Length=1343
2026.02
8.378
7.838
Qwen2-7B-Instruct (π0)
Response Length=1340
2026.02
8.334
7.769
Feedback
Search any
task
Search any
task