Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Probe Zero Collision Hash (MPZCH): Mitigating Embedding Collisions and Enhancing Model Freshness in Large-Scale Recommenders

About

Embedding tables are critical components of large-scale recommendation systems, facilitating the efficient mapping of high-cardinality categorical features into dense vector representations. However, as the volume of unique IDs expands, traditional hash-based indexing methods suffer from collisions that degrade model performance and personalization quality. We present Multi-Probe Zero Collision Hash (MPZCH), a novel indexing mechanism based on linear probing that effectively mitigates embedding collisions. With reasonable table sizing, it often eliminates these collisions entirely while maintaining production-scale efficiency. MPZCH utilizes auxiliary tensors and high-performance CUDA kernels to implement configurable probing and active eviction policies. By retiring obsolete IDs and resetting reassigned slots, MPZCH prevents the stale embedding inheritance typical of hash-based methods, ensuring new features learn effectively from scratch. Despite its collision-mitigation overhead, the system maintains training QPS and inference latency comparable to existing methods. Rigorous online experiments demonstrate that MPZCH achieves zero collisions for user embeddings and significantly improves item embedding freshness and quality. The solution has been released within the open-source TorchRec library for the broader community.

Ziliang Zhao, Bi Xue, Emma Lin, Mengjiao Zhou, Kaustubh Vartak, Shakhzod Ali-Zade, Tianqi Lu, Tao Li, Bin Kuang, Rui Jian, Bin Wen, Dennis van der Staay, Yixin Bao, Eddy Li, Chao Deng, Songbin Liu, Qifan Wang, Kai Ren• 2026

Related benchmarks

TaskDatasetResultRank
Intra-creator embedding similarityVideo Dataset (Same Day)
Intra-creator Similarity91
2
Intra-creator embedding similarityVideo Dataset Same & Next Day
Intra-creator Similarity91
2
Intra-creator embedding similarityVideo Dataset Overall
Intra-creator Embedding Similarity77
2
Share PredictionProduction User Embedding Table (online production)
NE Improvement0.0038
1
Skip PredictionUser Embedding Table (online production)
NE Improvement0.09
1
Video View Duration (VVD) PredictionUser Embedding Table (online production)
NE Improvement0.0012
1
Video View Percentage 100% (VVP100) PredictionUser Embedding Table (online production)
NE Improvement12
1
Showing 7 of 7 rows

Other info

Follow for update