Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation

About

Object hallucination remains a primary obstacle to the reliable deployment of Multimodal Large Language Models (MLLMs). Current inference-time mitigation methods mainly assume hallucinations stem from visual neglect, steering models to enhance visual reliance. In contrast, our systematic interventions on multiple MLLMs show that pushing toward more visual reliance may exacerbate hallucinations on some models, while less may mitigate hallucinations. This result suggests that attributing hallucinations solely to visual insufficiency is underdetermined. We argue that the image, as a context, simultaneously competes with the model's parametric knowledge and the textual context. For this, we propose a training-free framework, Context-Preference Activation Steering (CAS). It extracts two semantically distinct Context Preference Vectors (CPVs) via two small sets of designed conflict samples and applies them via single-pass signed residual injection at mid-early MLP layers during inference to control information reliance. Experiments show that CAS substantially mitigates object hallucinations without increasing decoding latency and preserves native text-generation quality.

Jingwen Wu, Xijun Zhang, Ge Song• 2026

Related benchmarks

TaskDatasetResultRank
Generative HallucinationAMBER Generative
Coverage (%)51.9
81
Object ProbingPOPE (average)
Accuracy87.84
52
Object Hallucination AssessmentCOCO 500 images
CHAIR Score (Scene)44
36
Showing 3 of 3 rows

Other info

Follow for update