Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Alignment on UltraFeedback (RM and GPT-4o-mini Evaluators)
Loading...
71.25
Win Rate (RM Evaluator)
Vanilla Baseline
47.5276
53.6863
59.845
66.0037
May 8, 2026
Win Rate (RM Evaluator)
Win Rate (GPT-4o-mini Evaluator)
Average Win Rate
Updated 23d ago
Evaluation Results
Method
Method
Links
Win Rate (RM Evaluator)
Win Rate (GPT-4o-mini Evaluator)
Average Win Rate
Vanilla Baseline
Preference Dimensions=...
2026.05
71.25
69.75
72.45
p-soup & Direct Fine-tuning
Preference Dimensions=...
2026.05
65.5
45.37
57.31
Direct Prompting
Preference Dimensions=...
2026.05
48.44
50.38
50.47
Feedback
Search any
task
Search any
task