Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Utility Evaluation on MT_Bench
Loading...
82.7
Agreement Rate
C2PO
31.532
44.816
58.1
71.384
Dec 29, 2025
Agreement Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Agreement Rate
C2PO
Backbone=DeepSeek-R1-D...
2025.12
82.7
CPO
Backbone=LLaMA-2-13B-Chat
2025.12
79.9
C2PO
Backbone=LLaMA-2-13B-Chat
2025.12
79.9
CPO
Backbone=DeepSeek-R1-D...
2025.12
77.1
IPO
Backbone=DeepSeek-R1-D...
2025.12
74.2
IPO
Backbone=LLaMA-2-13B-Chat
2025.12
67.3
GRPO
Backbone=DeepSeek-R1-D...
2025.12
67
BCO
Backbone=DeepSeek-R1-D...
2025.12
64.7
DPO
Backbone=DeepSeek-R1-D...
2025.12
64.2
BCO
Backbone=LLaMA-2-13B-Chat
2025.12
60.3
DPO
Backbone=LLaMA-2-13B-Chat
2025.12
48.5
FR
Backbone=LLaMA-2-13B-Chat
2025.12
45.4
FR
Backbone=DeepSeek-R1-D...
2025.12
39.7
GRPO
Backbone=LLaMA-2-13B-Chat
2025.12
33.5
Feedback
Search any
task
Search any
task