TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection

About

Large Vision-Language Models have demonstrated remarkable capabilities, yet they suffer from hallucinations that limit practical deployment. While various mitigation strategies exist, they often incur high computational overhead or require extensive retraining. In this paper, we address the issue of visual attention decay during generation, a key factor contributing to hallucinations. We propose Temporal Attention Real-time Accumulative Connection (TARAC), a novel training-free framework that dynamically accumulates and re-injects historical attention to sustain visual grounding. Inspired by cognitive reinforcement mechanisms, TARAC operates as a lightweight, plug-and-play module. Extensive experiments across diverse models (e.g., LLaVA, Qwen2-VL) and benchmarks demonstrate that TARAC significantly outperforms state-of-the-art methods. Remarkably, it achieves these gains with negligible inference overhead ($\sim$4\% TPOT increase), compared to the substantial costs of existing training-free baselines. Specifically, TARAC reduces hallucinated sentences by 25.2\% on CHAIR and improves Perception score by +10.65 on MME, validating its effectiveness and efficiency.

Lei Jiang, Chunzhao Xie, Tongxuan Liu, Yuting Zeng, jinrong Guo, Yunheng Shen, Weizhe Huang, Jing Li, Xiaohua Xu• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination	POPE Popular	F1 Score87.22	372
Object Hallucination	POPE (Random)	F1 Score89.01	324
Hallucination Evaluation	AMBER	CHAIR6.7	222
Object Hallucination Evaluation	POPE Adversarial	Accuracy84.72	159
Hallucination assessment	AMBER	CHAIR_s5	56
Image Captioning	Chair (test)	Cs Score43	22
Perception Evaluation	MME Perception	Score1.71e+3	21
Text Fluency Evaluation	AMBER	PPL113.13	9
Hallucination Mitigation	SHR (test)	SPI4.93	9
Object Hallucination Assessment	Chair (test)	CS (%)30	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord