The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation

About

While Multimodal Large Language Models (MLLMs) are increasingly integrated with Retrieval-Augmented Generation (RAG) to mitigate hallucinations, the introduction of external documents can conceal severe failure modes at the instance level. We identify and formalize the phenomenon of recorruption, where the introduction of even perfectly accurate "oracle" context causes a capable model to abandon an initially correct prediction. Through a mechanistic diagnosis of internal attention matrices, we show that recorruption is driven by a two-fold attentional collapse: (1) visual blindness, characterized by the systemic suppression of visual attention mass ($M_{vis}$) and sharpness ($S_{vis}$), and (2) a structural positional bias that forces the model to prioritize boundary tokens over semantic relevance. Our analysis reveals an Illusion of Success, demonstrating that many seemingly correct RAG outcomes are merely positional coincidences where the model's textual copying bias happens to align with the ground-truth location. To address these vulnerabilities, we propose Bottleneck Attention Intervention for Recovery (BAIR), a parameter-free, inference-time framework that restores visual saliency and applies position-aware penalties to textual distractors. Across medical factuality, social fairness, and geospatial benchmarks, BAIR successfully restores multimodal grounding and improves diagnostic reliability without requiring model retraining or fine-tuning.

Hoin Jung, Xiaoqian Wang• 2026

Related benchmarks

Task	Dataset	Result
Scene Classification	NWPU RESISC45 (test)	Top-1 Accuracy96.74	28
Medical factuality evaluation	IU-Chest X-ray (test)	Accuracy68.31	22
Multimodal Retrieval-Augmented Generation	NWPU (test)	Accuracy96.74	22
Multimodal Retrieval-Augmented Generation	FACET	Accuracy92.58	22
Multimodal Retrieval-Augmented Generation	IU-Chest	Accuracy68.31	22
Social fairness evaluation	FACET (test)	Accuracy92.58	22

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord