Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Alignment on Ultrafeedback 40% flipping ratio
Loading...
78.87
Accuracy
FA-DPO
56.198
62.084
67.97
73.856
Nov 30, 2025
Accuracy
Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Win Rate
FA-DPO
Backbone=LLama-3.1-8B
2025.11
78.87
68.5
FA-DPO
Backbone=Mistral-7B
2025.11
78.49
59
ROPO
Backbone=LLama-3.1-8B
2025.11
66.59
50.25
cDPO
Backbone=LLama-3.1-8B
2025.11
65.7
56.1
DPO
Backbone=LLama-3.1-8B
2025.11
64.96
56.1
ROPO
Backbone=Mistral-7B
2025.11
64.36
36.35
rDPO
Backbone=LLama-3.1-8B
2025.11
63.99
55.9
SIMPO
Backbone=Mistral-7B
2025.11
63.77
45
cDPO
Backbone=Mistral-7B
2025.11
63.24
34.6
rDPO
Backbone=Mistral-7B
2025.11
63.1
36.65
DPO
Backbone=Mistral-7B
2025.11
62.05
35.35
SIMPO
Backbone=LLama-3.1-8B
2025.11
57.07
38.95
Feedback
Search any
task
Search any
task