Share your thoughts, 1 month free Claude Pro on usSee more

Direct Preference Optimization on RLHFlow AlpacaEval 2.0

19.85LCWR

Difficulty-Based Preference Data Selection

Updated 2mo ago

Evaluation Results

Method	Links
Difficulty-Based Preference Data Selection 2025.08		19.85	19.44
Full Set 2025.08		18.74	17.93
Random 2025.08		18.57	18.13
ZIP† 2025.08		18.34	18.06
SDPO 2025.08		18.09	17.83
DiverseEvol† 2025.08		17.52	16.73
Tulu3-SFT 2025.08		2.57	2.16