Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction

About

Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches. In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence which is relevant to candidate item; (ii) Exact Search Unit models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1\% CTR and 4.4\% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.

Pi Qi, Xiaoqiang Zhu, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Kun Gai• 2020

Related benchmarks

TaskDatasetResultRank
CTR PredictionJD
AUC79.44
13
CTR PredictionPixel-1M
AUC0.6635
13
CTR PredictionAlibaba
AUC0.6247
11
CTR PredictionEle.me
AUC0.6414
11
CTR PredictionIndustry
AUC0.6936
11
Long-sequence CTR predictionMicroVideo (test)
GAUC0.6954
10
Long-sequence CTR predictionKuaiVideo ChinaMM 2018 (test)
GAUC65.77
10
CTR PredictionMicroVideo1.7M
GAUC0.7017
10
CTR PredictionEBNeRD small
GAUC0.696
10
CTR PredictionKuaiVideo
GAUC0.6672
10
Showing 10 of 15 rows

Other info

Follow for update