Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Evaluation on UltraFeedback 50 sampled questions
Loading...
62
Win Rate (Expert 1)
OTPO
45.36
49.68
54
58.32
May 24, 2025
Win Rate (Expert 1)
Win Rate (Expert 2)
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate (Expert 1)
Win Rate (Expert 2)
OTPO
Base Model=Llama-3-8B-...
2025.05
62
64
SimPO
Base Model=Llama-3-8B-...
2025.05
56
54
LDDPO
Base Model=Llama-3-8B-...
2025.05
56
48
SamPO
Base Model=Llama-3-8B-...
2025.05
48
46
DPO
Base Model=Llama-3-8B-...
2025.05
46
50
Feedback
Search any
task
Search any
task