Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Alignment on UltraFeedback 200K samples (test)

81.9Win Rate

Hard-Pair-GRPO

76.07677.58879.180.612May 7, 2026
Updated 26d ago

Evaluation Results

MethodLinks
2026.05
81.9
2026.05
79.1
2026.05
78.5
2026.05
77.8
2026.05
76.3