RankMixer: Scaling Up Ranking Models in Industrial Recommenders

About

Recent progress on large language models (LLMs) has spurred interest in scaling up recommendation systems, yet two practical obstacles remain. First, training and serving cost on industrial Recommenders must respect strict latency bounds and high QPS demands. Second, most human-designed feature-crossing modules in ranking models were inherited from the CPU era and fail to exploit modern GPUs, resulting in low Model Flops Utilization (MFU) and poor scalability. We introduce RankMixer, a hardware-aware model design tailored towards a unified and scalable feature-interaction architecture. RankMixer retains the transformer's high parallelism while replacing quadratic self-attention with multi-head token mixing module for higher efficiency. Besides, RankMixer maintains both the modeling for distinct feature subspaces and cross-feature-space interactions with Per-token FFNs. We further extend it to one billion parameters with a Sparse-MoE variant for higher ROI. A dynamic routing strategy is adapted to address the inadequacy and imbalance of experts training. Experiments show RankMixer's superior scaling abilities on a trillion-scale production dataset. By replacing previously diverse handcrafted low-MFU modules with RankMixer, we boost the model MFU from 4.5\% to 45\%, and scale our ranking model parameters by 100x while maintaining roughly the same inference latency. We verify RankMixer's universality with online A/B tests across two core application scenarios (Recommendation and Advertisement). Finally, we launch 1B Dense-Parameters RankMixer for full traffic serving without increasing the serving cost, which improves user active days by 0.3\% and total in-app usage duration by 1.08\%.

Jie Zhu, Zhifang Fan, Xiaoxie Zhu, Yuchen Jiang, Hangyu Wang, Xintian Han, Haoran Ding, Xinmin Wang, Wenlin Zhao, Zhen Gong, Huizhi Yang, Zheng Chai, Zhe Chen, Yuchao Zheng, Qiwei Chen, Feng Zhang, Xun Zhou, Peng Xu, Xiao Yang, Di Wu, Zuotao Liu• 2025

Related benchmarks

Task	Dataset	Result
CTR Prediction	Criteo	AUC0.8092	309
Click-Through Rate Prediction	Avazu (test)	AUC0.7927	207
CTR Prediction	Avazu	AUC77.72	171
CTR Prediction	Criteo (test)	AUC0.8137	147
Click-Through Rate Prediction	Criteo (test)	AUC0.8008	57
CTR Prediction	Industrial	AUC83.22	33
Sequential Recommendation	KuaiVideo	AUC74.82	25
CTR Prediction	Industrial max sequence length 1,600 (test)	AUC0.7221	18
CTR Prediction	Taobao latest 600 clicked items (test)	AUC63.41	18
CTR Prediction	SQS	AUC91.05	16

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord