StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random

About

In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.

Haoxuan Li, Chunyuan Zheng, Peng Wu• 2022

Related benchmarks

Task	Dataset	Result
Safety Evaluation	HarmBench	--	148
Reward Modeling	HelpSteer (test)	MAE0.318	65
Reward Modeling	UltraFeedback (test)	MAE0.272	38
Reward Modeling	PKU-SafeRLHF (test)	MAE0.1771	36
Safety Evaluation	DAN	Safety Score (DAN)0.782	26
Safety Evaluation	WildGuardMix	Safety Score0.8534	22
Safety Alignment	StrongREJECT	--	18
Rating Prediction	Music unbiased (test)	AUC68.7	12
Rating Prediction	Coat unbiased (test)	AUC0.719	12
Rating Prediction	KuaiRec unbiased (test)	AUC76.4	12

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord