Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SHP

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM AlignmentSHP
Diversity89.3
15
Reward MaximizationSHP
Win Rate0.53
12
Binary/Pairwise ClassificationSHP
Accuracy69.5
9
Open-ended DialogueSHP OOD
Win Rate77.5
4
Showing 4 of 4 rows