Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PPE Preference

Benchmarks

Task NameDataset NameSOTA ResultTrend
Preference PredictionPPE Preference (test)
Preference Score79.8
24
Reward ModelingPPE Preference ZH
Accuracy82.3
19
Reward ModelingPPE-Preference 1k
Positional Consistency51.7
8
Preference EvaluationPPE Preference (test)
Kuiper Statistic0.0434
8
Showing 4 of 4 rows