Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation

About

Object hallucination remains a primary obstacle to the reliable deployment of Multimodal Large Language Models (MLLMs). Current inference-time mitigation methods mainly assume hallucinations stem from visual neglect, steering models to enhance visual reliance. In contrast, our systematic interventions on multiple MLLMs show that pushing toward more visual reliance may exacerbate hallucinations on some models, while less may mitigate hallucinations. This result suggests that attributing hallucinations solely to visual insufficiency is underdetermined. We argue that the image, as a context, simultaneously competes with the model's parametric knowledge and the textual context. For this, we propose a training-free framework, Context-Preference Activation Steering (CAS). It extracts two semantically distinct Context Preference Vectors (CPVs) via two small sets of designed conflict samples and applies them via single-pass signed residual injection at mid-early MLP layers during inference to control information reliance. Experiments show that CAS substantially mitigates object hallucinations without increasing decoding latency and preserves native text-generation quality.

Jingwen Wu, Xijun Zhang, Ge Song• 2026

Related benchmarks

Task	Dataset	Result
Generative Hallucination	AMBER Generative	Coverage (%)51.9	81
Object Probing	POPE (average)	Accuracy87.84	52
Object Hallucination Assessment	COCO 500 images	CHAIR Score (Scene)44	36

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord