Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Utility Evaluation on Chatbot
Loading...
80
Agree Score
C2PO
29.976
42.963
55.95
68.937
Dec 29, 2025
Agree Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Agree Score
C2PO
Backbone=LLaMA-2-13B-Chat
2025.12
80
C2PO
Backbone=DeepSeek-R1-D...
2025.12
77.8
CPO
Backbone=LLaMA-2-13B-Chat
2025.12
76.1
CPO
Backbone=DeepSeek-R1-D...
2025.12
75.7
IPO
Backbone=DeepSeek-R1-D...
2025.12
67.7
IPO
Backbone=LLaMA-2-13B-Chat
2025.12
60.7
GRPO
Backbone=DeepSeek-R1-D...
2025.12
52.7
BCO
Backbone=DeepSeek-R1-D...
2025.12
50.9
BCO
Backbone=LLaMA-2-13B-Chat
2025.12
47.1
DPO
Backbone=DeepSeek-R1-D...
2025.12
47.1
DPO
Backbone=LLaMA-2-13B-Chat
2025.12
39.8
GRPO
Backbone=LLaMA-2-13B-Chat
2025.12
36.1
FR
Backbone=LLaMA-2-13B-Chat
2025.12
33.4
FR
Backbone=DeepSeek-R1-D...
2025.12
31.9
Feedback
Search any
task
Search any
task