Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

About

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to process multiple images as inputs. However, the vulnerabilities of multi-image MLLMs remain unexplored. Existing adversarial attacks focus on single-image settings and often assume a white-box threat model, which is impractical in many real-world scenarios. This paper introduces LAMP, a black-box method for learning Universal Adversarial Perturbations (UAPs) targeting multi-image MLLMs. LAMP applies an attention-based constraint that prevents the model from effectively aggregating information across images. LAMP also introduces a novel cross-image contagious constraint that forces perturbed tokens to influence clean tokens, spreading adversarial effects without requiring all inputs to be modified. Additionally, an index-attention suppression loss enables a robust position-invariant attack. Experimental results show that LAMP outperforms SOTA baselines and achieves the highest attack success rates across multiple vision-language tasks and models.

Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim, Chris Thomas• 2026

Related benchmarks

TaskDatasetResultRank
Adversarial AttackMantis-Eval
Attack Success Rate84.57
37
Adversarial AttackNLVR2
Attack Success Rate67.51
37
Adversarial AttackBLINK
Attack Success Rate (ASR)87.65
37
Adversarial AttackQ-Bench
Attack Success Rate87.23
37
Adversarial AttackMVBench
ASR83.84
37
Visual Question AnsweringMM-Vet--
27
Visual Question AnsweringOK-VQA
VQA Score80.7
18
Visual Question AnsweringLLaVA-Bench
VQA ASR68.31
12
Visual Question AnsweringMantis-Eval
ASR71.32
12
Image CaptioningMS-COCO
ASR (Average Sentence Rate)78.3
6
Showing 10 of 10 rows

Other info

Follow for update