Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Direct Preference Optimization on RLHFlow AlpacaEval 2.0

19.85LCWR

Difficulty-Based Preference Data Selection

1.87886.544411.2115.8756Aug 6, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.08
19.8519.44
2025.08
18.7417.93
2025.08
18.5718.13
2025.08
18.3418.06
2025.08
18.0917.83
2025.08
17.5216.73
2025.08
2.572.16