TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection
About
Large Vision-Language Models have demonstrated remarkable capabilities, yet they suffer from hallucinations that limit practical deployment. While various mitigation strategies exist, they often incur high computational overhead or require extensive retraining. In this paper, we address the issue of visual attention decay during generation, a key factor contributing to hallucinations. We propose Temporal Attention Real-time Accumulative Connection (TARAC), a novel training-free framework that dynamically accumulates and re-injects historical attention to sustain visual grounding. Inspired by cognitive reinforcement mechanisms, TARAC operates as a lightweight, plug-and-play module. Extensive experiments across diverse models (e.g., LLaVA, Qwen2-VL) and benchmarks demonstrate that TARAC significantly outperforms state-of-the-art methods. Remarkably, it achieves these gains with negligible inference overhead ($\sim$4\% TPOT increase), compared to the substantial costs of existing training-free baselines. Specifically, TARAC reduces hallucinated sentences by 25.2\% on CHAIR and improves Perception score by +10.65 on MME, validating its effectiveness and efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination | POPE (Random) | F1 Score89.01 | 285 | |
| Object Hallucination | POPE Popular | F1 Score87.22 | 273 | |
| Hallucination Evaluation | AMBER | CHAIR6.7 | 172 | |
| Hallucination assessment | AMBER | CHAIR_s5 | 56 | |
| Object Hallucination Evaluation | POPE Adversarial | Accuracy84.72 | 55 | |
| Image Captioning | Chair (test) | Cs Score43 | 22 | |
| Perception Evaluation | MME Perception | Score1.71e+3 | 21 | |
| Text Fluency Evaluation | AMBER | PPL113.13 | 9 | |
| Hallucination Mitigation | SHR (test) | SPI4.93 | 9 | |
| Object Hallucination Assessment | Chair (test) | CS (%)30 | 9 |