Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Structural Bias Evaluation on HANS
Loading...
99.6
Accuracy
C2PO
49.264
62.332
75.4
88.468
Dec 29, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
C2PO
Backbone=DeepSeek-R1-D...
2025.12
99.6
GRPO
Backbone=DeepSeek-R1-D...
2025.12
99
IPO
Backbone=DeepSeek-R1-D...
2025.12
97.7
CPO
Backbone=DeepSeek-R1-D...
2025.12
96.9
C2PO
Backbone=LLaMA-2-13B-Chat
2025.12
95.9
BCO
Backbone=DeepSeek-R1-D...
2025.12
86.8
BCO
Backbone=LLaMA-2-13B-Chat
2025.12
86.1
CPO
Backbone=LLaMA-2-13B-Chat
2025.12
85.7
IPO
Backbone=LLaMA-2-13B-Chat
2025.12
77.9
GRPO
Backbone=LLaMA-2-13B-Chat
2025.12
66.6
DPO
Backbone=LLaMA-2-13B-Chat
2025.12
57.9
FR
Backbone=LLaMA-2-13B-Chat
2025.12
54.2
FR
Backbone=DeepSeek-R1-D...
2025.12
53.6
DPO
Backbone=DeepSeek-R1-D...
2025.12
51.2
Feedback
Search any
task
Search any
task