Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Evaluation on UltraFeedback 50 sampled questions
Loading...
62
Win Rate (Expert 1)
OTPO
45.36
49.68
54
58.32
May 24, 2025
Win Rate (Expert 1)
Win Rate (Expert 2)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate (Expert 1)
Win Rate (Expert 2)
OTPO
Base Model=Llama-3-8B-...
2025.05
62
64
SimPO
Base Model=Llama-3-8B-...
2025.05
56
54
LDDPO
Base Model=Llama-3-8B-...
2025.05
56
48
SamPO
Base Model=Llama-3-8B-...
2025.05
48
46
DPO
Base Model=Llama-3-8B-...
2025.05
46
50
Feedback
Search any
task
Search any
task