Stop Treating Collisions Equally: Qualification-Aware Semantic ID Learning for Recommendation at Industrial Scale
About
Semantic IDs (SIDs) are compact discrete representations derived from multimodal item features, serving as a unified abstraction for ID-based and generative recommendation. However, learning high-quality SIDs remains challenging due to two issues. (1) Collision problem: the quantized token space is prone to collisions, in which semantically distinct items are assigned identical or overly similar SID compositions, resulting in semantic entanglement. (2) Collision-signal heterogeneity: collisions are not uniformly harmful. Some reflect genuine conflicts between semantically unrelated items, while others stem from benign redundancy or systematic data effects. To address these challenges, we propose Qualification-Aware Semantic ID Learning (QuaSID), an end-to-end framework that learns collision-qualified SIDs by selectively repelling qualified conflict pairs and scaling the repulsion strength by collision severity. QuaSID consists of two mechanisms: Hamming-guided Margin Repulsion, which translates low-Hamming SID overlaps into explicit, severity-scaled geometric constraints on the encoder space; and Conflict-Aware Valid Pair Masking, which masks protocol-induced benign overlaps to denoise repulsion supervision. In addition, QuaSID incorporates a dual-tower contrastive objective to inject collaborative signals into tokenization. Experiments on public benchmarks and industrial data validate QuaSID. On public datasets, QuaSID consistently outperforms strong baselines, improving top-K ranking quality by 5.9% over the best baseline while increasing SID composition diversity. In an online A/B test on Kuaishou e-commerce with a 5% traffic split, QuaSID increases ranking GMV-S2 by 2.38% and improves completed orders on cold-start retrieval by up to 6.42%. Finally, we show that the proposed repulsion loss is plug-and-play and enhances a range of SID learning frameworks across datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Recommendation | Toys | Recall@31.95 | 9 | |
| Multimodal Recommendation | Beauty | Recall@32.01 | 9 | |
| Generative Recommendation | Amazon-Beauty 5-core (test) | HR@52.77 | 8 | |
| Generative Recommendation | Amazon Toys 5-core (test) | HR@52.66 | 8 | |
| Retrieval | Kuaishou e-commerce General Online Traffic | Completed Orders1.09 | 2 | |
| Ranking | Kuaishou e-commerce General Online Traffic | Completed Orders0.2 | 1 | |
| Ranking | Kuaishou e-commerce Cold-start 100vv | Completed Orders1.77 | 1 | |
| Ranking | Kuaishou e-commerce Cold-start 600vv | Completed Orders2.64 | 1 | |
| Retrieval | Kuaishou e-commerce Cold-start 100vv | Completed Orders Change6.42 | 1 | |
| Retrieval | Kuaishou e-commerce Cold-start 600vv | Completed Orders4.69 | 1 |