Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

About

Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that manipulates interaction logs or perturbs a single modality, it also introduces a new attack surface where synchronised multimodal poisoning can reliably steer fused representations along stable semantic directions during fine-tuning. To characterise this threat, we formalise cross-modal interactive poisoning and propose VENOMREC, which performs Exposure Alignment to identify high-exposure regions in the joint embedding space and Cross-modal Interactive Perturbation to craft attention-guided coupled token-patch edits. Experiments on three real-world multimodal datasets demonstrate that VENOMREC consistently outperforms strong baselines, achieving 0.73 mean ER@20 and improving over the strongest baseline by +0.52 absolute ER points on average, while maintaining comparable recommendation utility.

Guowei Guan, Yurong Hao, Jiaming Zhang, Tiantong Wu, Fuyao Zhang, Tianxiang Chen, Longtao Huang, Cyril Leung, Wei Yang Bryan Lim• 2026

Related benchmarks

TaskDatasetResultRank
Multimodal RecommendationAmazon Sports Few-Shot (test)
HR (Top-5)16.99
12
Multimodal RecommendationAmazon Clothing Zero-Shot (test)
HR @ 514.02
12
Multimodal RecommendationAmazon Sports Zero-Shot (test)
HR @50.1717
12
Multimodal RecommendationAmazon Clothing Few-Shot (test)
HR (Top-5)0.1402
12
Multimodal RecommendationAmazon Toys Few-Shot (test)
HR (Top-5)0.1341
12
Multimodal RecommendationAmazon Toys Zero-Shot (test)
HR@513.97
12
Showing 6 of 6 rows

Other info

Follow for update