Share your thoughts, 1 month free Claude Pro on usSee more

LLM Alignment on UltraFeedback 200K samples (test)

81.9Win Rate

Hard-Pair-GRPO

Updated 2mo ago

Evaluation Results

Method	Links
Hard-Pair-GRPO 2026.05		81.9
ORPO 2026.05		79.1
DPO 2026.05		78.5
Soft-Pair-GRPO 2026.05		77.8
Standard GRPO 2026.05		76.3