Share your thoughts, 1 month free Claude Pro on usSee more

Multi-turn Conversation Evaluation on MT-Bench 1.0 (test)

8.538GPT-4 Score

DAR

Updated 4mo ago

Evaluation Results

Method	Links
DAR 2026.02		8.538	7.931
GRPO 2026.02		8.425	7.856
RLOO 2026.02		8.409	7.893
Iter-SFT 2026.02		8.378	7.838
Qwen2-7B-Instruct (π0) 2026.02		8.334	7.769