RosePO: Aligning LLM-based Recommenders with Human Values

About

Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for recommendation systems, which usually adapt a pre-trained LLM to the recommendation scenario through supervised fine-tuning (SFT). However, both the pre-training and SFT stages fail to explicitly model the comparative relationships of a user's preferences on different items. To construct a "helpful and harmless" LLM-based recommender, we propose a general framework -- Recommendation with smoothing personalized Preference Optimization (RosePO), which better aligns with customized human values during the post-training stage. Specifically, in addition to the input and chosen response that naturally align with SFT data, we design a rejected sampling strategy tailored for enhancing helpfulness, along with two strategies aimed at mitigating biases to promote harmlessness. To ensure robustness against uncertain labels present in automatically constructed preference data, we introduce a personalized smoothing factor predicted by a preference oracle into the optimization objective. Evaluation on three real-world datasets demonstrates the effectiveness of our method, showcasing not only improved recommendation performance but also mitigation of semantic hallucination and popularity bias.

Jiayi Liao, Xiangnan He, Ruobing Xie, Jiancan Wu, Yancheng Yuan, Xingwu Sun, Zhanhui Kang, Xiang Wang• 2024

Related benchmarks

Task	Dataset	Result
Recommendation	Amazon CD and Vinyl (test)	NDCG@100.0061	26
Sequential Recommendation	Amazon Toy	N@51.14	24
Generative Recommendation	ML OOD 10M	Hit Rate @1051	18
Sequential Recommendation	Amazon-Book	N@51.17	15
Sequential Recommendation	Amazon Office	N@52.66	15
Sequential Recommendation	Amazon Clothing	N@50.0066	15
Sequential Recommendation	ArT (test)	Hit@50.087	13
Sequential Recommendation	Instrument (test)	Hit Rate@58.61	13
Sequential Recommendation	Game (test)	Hit@56.32	13
Generative Recommendation	Yelp OOD 2018	H@100.82	9

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord