Share your thoughts, 1 month free Claude Pro on usSee more

LLM Alignment on HH-RLHF 100K samples (test)

82.3Helpfulness Score

Hard-Pair-GRPO

Updated 2mo ago

Evaluation Results

Method	Links
Hard-Pair-GRPO 2026.05		82.3	85.7
ORPO 2026.05		80.5	83.2
DPO 2026.05		80.1	82.8
Soft-Pair-GRPO 2026.05		79.5	82.1
Standard GRPO 2026.05		78.2	81.5