LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
About
Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sequence Modeling | Industrial Dataset | AUC64.48 | 48 | |
| Sequence Modeling | Taobao-MM | AUC0.6714 | 12 | |
| CTR Prediction | Industry | AUC0.7007 | 11 | |
| CTR Prediction | Alibaba | AUC0.6457 | 11 | |
| CTR Prediction | Ele.me | AUC0.665 | 11 | |
| CTR Prediction | Industrial Dataset (Douyin Search) (test) | AUC0.6478 | 9 | |
| Click-Through Rate Prediction | Baidu real-world industrial dataset (test) | AUC0.8361 | 7 |