Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Top-K Off-Policy Correction for a REINFORCE Recommender System

About

Industrial recommender systems deal with extremely large action spaces -- many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learning. Learning from the logged feedback is however subject to biases caused by only observing feedback on recommendations selected by the previous versions of the recommender. In this work, we present a general recipe of addressing such biases in a production top-K recommender system at Youtube, built with a policy-gradient-based algorithm, i.e. REINFORCE. The contributions of the paper are: (1) scaling REINFORCE to a production recommender system with an action space on the orders of millions; (2) applying off-policy correction to address data biases in learning from logged feedback collected from multiple behavior policies; (3) proposing a novel top-K off-policy correction to account for our policy recommending multiple items at a time; (4) showcasing the value of exploration. We demonstrate the efficacy of our approaches through a series of simulations and multiple live experiments on Youtube.

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed Chi• 2018

Related benchmarks

TaskDatasetResultRank
Click-Through Rate PredictionAvazu (test)
AUC0.7867
191
Off-Policy LearningWiki10-31K Synthetic tau=1 (test)
P@555.26
14
Off-Policy LearningWiki10-31K Synthetic tau=2 (test)
P@50.5409
14
Off-Policy LearningWiki10-31K Synthetic tau=0.5 (test)
P@50.5515
14
RecommendationYahoo! R3 (test)
P@528.08
13
RecommendationKuaiRec (test)
Precision@5087.5
13
RecommendationCoat (test)
Precision@50.2758
13
Multi-objective RecommendationSpotify (offline)
Listening Rate58.26
9
Multi-objective RecommendationAlibaba-Youku (offline)
VV71.68
9
Multi-objective RecommendationYelp (offline)
Relevance0.6536
9
Showing 10 of 12 rows

Other info

Follow for update