One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs
About
Vision Language Models (VLMs) achieve strong performance on multimodal tasks but still suffer from hallucination and safety-related failures that persist even at scale. Steering offers a lightweight technique to improve model performance. However, steering, whether input-dependent or input-independent, achieves a meaningful trade-off between efficiency and effectiveness. In this work, we observe that steering vectors can generalize across inputs when tasks share aligned semantic intent. Based on this insight, we propose \textbf{OSGA} (\textbf{O}ne-shot \textbf{S}teering with \textbf{G}enerative \textbf{A}nchor), an input-independent framework that improves model performance with a single optimization instance. OSGA first selects an informative sample via a variance-based data selection strategy and learns a single steering vector with a contrastive objective with generative anchor regularization. The resulting vector can be universally applied at a certain layer during inference time without modifying model parameters. Experiments across multiple benchmarks show that a single OSGA-optimized steering vector consistently improves hallucination mitigation and safety enhancement with negligible overhead, highlighting one-shot steering as a practical and scalable solution for reliable VLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | CHAIR | CS Score28.8 | 49 | |
| Object Hallucination Evaluation | POPE (test) | -- | 44 | |
| Object Hallucination | MSCOCO POPE (test) | Accuracy (Random)0.8787 | 11 | |
| Object Hallucination | A-OKVQA POPE (test) | Accuracy (Random)90.13 | 8 | |
| Hallucination Evaluation | MME-Hall LLaVA-v1.5 (test) | Total Score699.5 | 6 | |
| Multimodal Safety Evaluation | GOAT (test) | Misogyny Accuracy56.9 | 2 |