Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Out-of-Domain (OOD) Bias Evaluation on StereoSet
Loading...
67.2
Accuracy
C2PO
31.944
41.097
50.25
59.403
Dec 29, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
C2PO
Backbone=LLaMA-2-13B-Chat
2025.12
67.2
CPO
Backbone=LLaMA-2-13B-Chat
2025.12
64
GRPO
Backbone=LLaMA-2-13B-Chat
2025.12
60.4
CPO
Backbone=DeepSeek-R1-D...
2025.12
60.1
C2PO
Backbone=DeepSeek-R1-D...
2025.12
60.1
GRPO
Backbone=DeepSeek-R1-D...
2025.12
60
FR
Backbone=DeepSeek-R1-D...
2025.12
58.1
IPO
Backbone=LLaMA-2-13B-Chat
2025.12
53.8
BCO
Backbone=LLaMA-2-13B-Chat
2025.12
53
IPO
Backbone=DeepSeek-R1-D...
2025.12
44.9
FR
Backbone=LLaMA-2-13B-Chat
2025.12
42.8
DPO
Backbone=LLaMA-2-13B-Chat
2025.12
41.8
DPO
Backbone=DeepSeek-R1-D...
2025.12
33.4
BCO
Backbone=DeepSeek-R1-D...
2025.12
33.3
Feedback
Search any
task
Search any
task