VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems
About
Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that manipulates interaction logs or perturbs a single modality, it also introduces a new attack surface where synchronised multimodal poisoning can reliably steer fused representations along stable semantic directions during fine-tuning. To characterise this threat, we formalise cross-modal interactive poisoning and propose VENOMREC, which performs Exposure Alignment to identify high-exposure regions in the joint embedding space and Cross-modal Interactive Perturbation to craft attention-guided coupled token-patch edits. Experiments on three real-world multimodal datasets demonstrate that VENOMREC consistently outperforms strong baselines, achieving 0.73 mean ER@20 and improving over the strongest baseline by +0.52 absolute ER points on average, while maintaining comparable recommendation utility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Recommendation | Amazon Sports Few-Shot (test) | HR (Top-5)16.99 | 12 | |
| Multimodal Recommendation | Amazon Clothing Zero-Shot (test) | HR @ 514.02 | 12 | |
| Multimodal Recommendation | Amazon Sports Zero-Shot (test) | HR @50.1717 | 12 | |
| Multimodal Recommendation | Amazon Clothing Few-Shot (test) | HR (Top-5)0.1402 | 12 | |
| Multimodal Recommendation | Amazon Toys Few-Shot (test) | HR (Top-5)0.1341 | 12 | |
| Multimodal Recommendation | Amazon Toys Zero-Shot (test) | HR@513.97 | 12 |