Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pairwise Preference Comparisons

Benchmarks

Task NameDataset NameSOTA ResultTrend
Pairwise Preference ComparisonPairwise Preference Comparisons 1.5B Scale (test)
Avg Preference Score0.834
30
Showing 1 of 1 rows