Large Language Models are Zero-Shot Rankers for Recommender Systems

About

Recently, large language models (LLMs) (e.g., GPT-4) have demonstrated impressive general-purpose task-solving abilities, including the potential to approach recommendation tasks. Along this line of research, this work aims to investigate the capacity of LLMs that act as the ranking model for recommender systems. We first formalize the recommendation problem as a conditional ranking task, considering sequential interaction histories as conditions and the items retrieved by other candidate generation models as candidates. To solve the ranking task by LLMs, we carefully design the prompting template and conduct extensive experiments on two widely-used datasets. We show that LLMs have promising zero-shot ranking abilities but (1) struggle to perceive the order of historical interactions, and (2) can be biased by popularity or item positions in the prompts. We demonstrate that these issues can be alleviated using specially designed prompting and bootstrapping strategies. Equipped with these insights, zero-shot LLMs can even challenge conventional recommendation models when ranking candidates are retrieved by multiple candidate generators. The code and processed datasets are available at https://github.com/RUCAIBox/LLMRank.

Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, Wayne Xin Zhao• 2023

Related benchmarks

Task	Dataset	Result
Sequential Recommendation	ML 1M	--	140
Recommendation	MovieLens 1M	nDCG@1039.8	49
Sequential Recommendation	Amazon Office (test)	NDCG@1026.65	38
Recommendation	CDs	NDCG@50.1777	29
Sequential Recommendation	Amazon Video-Games	NDCG@100.3125	22
Multi-hop Question Answering	MuSiQue	Recall@123.4	22
Multi-hop Question Answering	HotpotQA	Recall@130.7	22
Recommendation	CDs sparse	NDCG@114	20
Recommendation	Games	HR@15.28	19
Ranking Recommendation	MIND	NDCG@1033.04	15

Showing 10 of 34 rows

Other info

Follow for update

@wizwand_team Discord