Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Correlated Reward Models: Statistical Barriers and Opportunities

About

Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.

Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Gabriele Farina, Sobhan Mohammadpour• 2025

Related benchmarks

TaskDatasetResultRank
Preference PredictionMovieLens 10k
Accuracy (Q=0.25)61
20
Preference PredictionMovieLens 1k ratings
Accuracy (0.25 Quantile)61
10
Preference PredictionNetflix Prize 100k ratings
Accuracy (0.25 Quantile)62
10
Preference PredictionNetflix Prize 150k ratings
Accuracy (0.25 Quantile)61
10
Preference PredictionMovieLens 50k ratings
Accuracy (0.25 Quantile)59
10
Preference PredictionSushi Preference Category B
Accuracy (0.25 Quantile)67
10
Preference PredictionSushi Preference Category A
Accuracy (Q0.25)68
5
Preference PredictionEigen-taste Jokes onehot variant
Accuracy (0.25 Q)61
5
Showing 8 of 8 rows

Other info

Follow for update