Memento: Personalized RAG-Style Long-Retention Data Scaling for META Ads Recommendation

About

Modeling of long history data suffers from long-context window attention dilution, system efficiency and catastrophic forgetting problems, where naive linear scaling approach like LastN would fail. We introduce Memento, a personalized retrieval-augmented framework that treats historical user engagements as a document corpus and ad requests as queries, retrieving relevant interactions via Maximal Marginal Relevance (MMR) to balance similarity with diversity. We identify two complementary applications: Representation Memento, which retrieves historical embeddings for feature augmentation, and Data Memento, which retrieves past training examples for multipass training. Through infrastructure co-design -- temporal chunking, INT8 quantization, and asynchronous serving -- Memento achieves 5-10$\times$ resource efficiency over linear scaling. Memento processes daily requests with sub-10ms latency, yielding 0.25-0.3% Normalized Entropy gain on both click-through and conversion prediction. In production, Memento delivers a 1% CTR lift on Facebook Feed and Reels and a 1.2% CVR lift, scaling personalization to 365+ days of history.

Xiaoyu Chen, Ruichen Wang, Jieming Di, Suofei Feng, Nafis Abrar, Lilly Kumari, Tony Tsui, Yilin Liu, Yu Lu, Sowmya Patapati, Junwei Xiong, Qiao Yang, Dorothy Sun, Yang Cao, Victor Chen, Pan Chen, Ramsundar Sundarkumar, Shivendra Pratap Singh, Arnold Overwijk, Ling Leng, Dinesh Ramasamy, Sri Reddy, Robert Malkin, Sandeep Pandey• 2026

Related benchmarks

Task	Dataset	Result
CTR Prediction	Internal Production Advertising Dataset	CTR N.E.-0.25	6
CVR prediction	Internal Production Advertising Dataset	CVR Relative Error-0.26	6
Conversion Rate Prediction	Production CVR Model Data 150+ day V0 (Evaluation)	Normalized Error (Eval)-0.195	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord