Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Chat Evaluation on MT-Bench
Loading...
8.58
MT-Bench Score
VRM-PPO
6.032
6.6935
7.355
8.0165
Mar 5, 2026
MT-Bench Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
MT-Bench Score
VRM-PPO
Backbone=Qwen 3-8B
2026.03
8.58
PPO
Backbone=Qwen 3-8B
2026.03
8.44
VRM-PPO
Backbone=Qwen2.5-7B
2026.03
7.98
KTO
Backbone=Qwen 3-8B
2026.03
7.97
DPO
Backbone=Qwen 3-8B
2026.03
7.86
DPO
Backbone=Qwen2.5-7B
2026.03
7.83
WPO
Backbone=Qwen2.5-7B
2026.03
7.81
PPO
Backbone=Qwen2.5-7B
2026.03
7.81
SELECTIVE DPO
Backbone=Qwen2.5-7B
2026.03
7.74
IPO
Backbone=Qwen2.5-7B
2026.03
7.64
KTO
Backbone=Qwen2.5-7B
2026.03
7.63
SIMPO
Backbone=Qwen2.5-7B
2026.03
7.58
IPO
Backbone=Qwen 3-8B
2026.03
6.55
WPO
Backbone=Qwen 3-8B
2026.03
6.38
SIMPO
Backbone=Qwen 3-8B
2026.03
6.24
SELECTIVE DPO
Backbone=Qwen 3-8B
2026.03
6.13
Feedback
Search any
task
Search any
task