Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Alignment on Ultrafeedback (20% flipping ratio)
Loading...
78.8
Accuracy
FA-DPO
66.1016
69.3983
72.695
75.9917
Nov 30, 2025
Accuracy
Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Win Rate
FA-DPO
Backbone=LLama-3.1-8B
2025.11
78.8
65.1
FA-DPO
Backbone=Mistral-7B
2025.11
78.05
61.35
ROPO
Backbone=LLama-3.1-8B
2025.11
75.82
62.75
cDPO
Backbone=LLama-3.1-8B
2025.11
75.22
64.5
rDPO
Backbone=LLama-3.1-8B
2025.11
73.96
59.75
ROPO
Backbone=Mistral-7B
2025.11
73.96
54.3
DPO
Backbone=LLama-3.1-8B
2025.11
73.89
62.9
cDPO
Backbone=Mistral-7B
2025.11
72.99
47.75
rDPO
Backbone=Mistral-7B
2025.11
71.95
44.75
DPO
Backbone=Mistral-7B
2025.11
71.35
45.75
SIMPO
Backbone=Mistral-7B
2025.11
67.04
54.75
SIMPO
Backbone=LLama-3.1-8B
2025.11
66.59
62.8
Feedback
Search any
task
Search any
task