CEMG: Collaborative-Enhanced Multimodal Generative Recommendation

About

Generative recommendation models often struggle with two key challenges: (1) the superficial integration of collaborative signals, and (2) the decoupled fusion of multimodal features. These limitations hinder the creation of a truly holistic item representation. To overcome this, we propose CEMG, a novel Collaborative-Enhaned Multimodal Generative Recommendation framework. Our approach features a Multimodal Fusion Layer that dynamically integrates visual and textual features under the guidance of collaborative signals. Subsequently, a Unified Modality Tokenization stage employs a Residual Quantization VAE (RQ-VAE) to convert this fused representation into discrete semantic codes. Finally, in the End-to-End Generative Recommendation stage, a large language model is fine-tuned to autoregressively generate these item codes. Extensive experiments demonstrate that CEMG significantly outperforms state-of-the-art baselines.

Yuzhen Lin, Hongyi Chen, Xuanjing Chen, Shaowen Wang, Ivonne Xu, Dongming Jiang• 2025

Related benchmarks

Task	Dataset	Result
Multimodal Generative Recommendation	Beauty	HR@106.65	10
Multimodal Generative Recommendation	Sports	HR@103.63	10
Multimodal Generative Recommendation	Yelp	HR@104.58	10
Cold-start recommendation	Beauty (test)	HR@100.0305	4
Cold-start recommendation	Sports (test)	HR@101.83	4
Cold-start recommendation	Yelp (test)	HR@102.31	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord