Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SHP

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM AlignmentSHP
Diversity89.3
15
Direct Preference OptimizationSHP AlpacaEval 2.0
LCWR18.44
14
Reward MaximizationSHP
Win Rate0.53
12
Binary/Pairwise ClassificationSHP
Accuracy69.5
9
Open-ended DialogueSHP OOD
Win Rate77.5
4
Showing 5 of 5 rows