Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Alignment on UltraFeedback 200K samples (test)
Loading...
81.9
Win Rate
Hard-Pair-GRPO
76.076
77.588
79.1
80.612
May 7, 2026
Win Rate
Updated 26d ago
Evaluation Results
Method
Method
Links
Win Rate
Hard-Pair-GRPO
Base Model=LLaMA-2-7B-...
2026.05
81.9
ORPO
Base Model=LLaMA-2-7B-...
2026.05
79.1
DPO
Base Model=LLaMA-2-7B-...
2026.05
78.5
Soft-Pair-GRPO
Base Model=LLaMA-2-7B-...
2026.05
77.8
Standard GRPO
Base Model=LLaMA-2-7B-...
2026.05
76.3
Feedback
Search any
task
Search any
task