Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chatbot Evaluation on MT-Bench (GPT-4-Turbo Score)

9.5Score (GPT-4-Turbo)

Qwen3-32B + RLBFF training

8.90729.06119.2159.3689Sep 25, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.09
9.5
2025.09
9.49
2025.09
9.45
2025.09
9.38
2025.09
9.26
2025.09
8.93