Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CEMG: Collaborative-Enhanced Multimodal Generative Recommendation

About

Generative recommendation models often struggle with two key challenges: (1) the superficial integration of collaborative signals, and (2) the decoupled fusion of multimodal features. These limitations hinder the creation of a truly holistic item representation. To overcome this, we propose CEMG, a novel Collaborative-Enhaned Multimodal Generative Recommendation framework. Our approach features a Multimodal Fusion Layer that dynamically integrates visual and textual features under the guidance of collaborative signals. Subsequently, a Unified Modality Tokenization stage employs a Residual Quantization VAE (RQ-VAE) to convert this fused representation into discrete semantic codes. Finally, in the End-to-End Generative Recommendation stage, a large language model is fine-tuned to autoregressively generate these item codes. Extensive experiments demonstrate that CEMG significantly outperforms state-of-the-art baselines.

Yuzhen Lin, Hongyi Chen, Xuanjing Chen, Shaowen Wang, Ivonne Xu, Dongming Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal Generative RecommendationBeauty
HR@106.65
10
Multimodal Generative RecommendationSports
HR@103.63
10
Multimodal Generative RecommendationYelp
HR@104.58
10
Cold-start recommendationBeauty (test)
HR@100.0305
4
Cold-start recommendationSports (test)
HR@101.83
4
Cold-start recommendationYelp (test)
HR@102.31
4
Showing 6 of 6 rows

Other info

Follow for update