Efficient Dataset Selection for Continual Adaptation of Generative Recommenders
About
Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection can mitigate performance degradation caused by temporal distributional drift while maintaining scalability. We evaluate a range of representation choices and sampling strategies for curating small but informative subsets of user interaction data. Our results demonstrate that gradient-based representations, coupled with distribution-matching, improve downstream model performance, achieving training efficiency gains while preserving robustness to drift. These findings highlight data curation as a practical mechanism for scalable monitoring and adaptive model updates in production-scale recommendation systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Next-item prediction | Proprietary music and podcast streaming dataset 1-year evaluation horizon | NDCG@50 Error Recovery72 | 4 | |
| Next-item prediction | Proprietary music and podcast streaming dataset (3-year evaluation horizon) | NDCG@50 Drift Recovery78 | 4 |