SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

About

Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal information in videos when responding to user queries. Therefore, they often generate descriptions of events that are temporal inconsistent or causally implausible, causing severe hallucination issues. While most prior studies have focused on spatial hallucinations (e.g. object mismatches), temporal reasoning in video understanding remains relatively underexplored. To address this issue, we propose Self-Diagnostic Contrastive Decoding (SEASON), a training-free method that adaptively enhances temporal and spatial faithfulness for each output token. It achieves this by dynamically diagnosing each token's hallucination tendency and applying adaptive contrastive decoding against its corresponding temporal and spatial negatives. Extensive experiments demonstrate that SEASON outperforms all existing training-free hallucination mitigation approaches on three hallucination examination benchmarks, while further improves VideoLLMs across four general video understanding benchmarks. The code will be released upon acceptance.

Chang-Hsun Wu, Kai-Po Chang, Yu-Yang Sheng, Hung-Kai Chung, Kuei-Chun Wang, Yu-Chiang Frank Wang• 2025

Related benchmarks

Task	Dataset	Result
Video Understanding	MVBench (test)	Accuracy69.6	190
Temporal Video Understanding	TempCompass	Accuracy74.1	141
Video Understanding	Video-MME (test)	Accuracy60.9	51
Video Hallucination Evaluation	VideoHallucer	Overall Score54.8	46
Temporal Video Understanding	TVBench (test)	Accuracy49.5	22
Hallucination-oriented Video Understanding	VidHallucer (test)	Accuracy55.1	22
Hallucination-oriented Video Understanding	EventHallusion (test)	Accuracy71.6	22
Hallucination Examination	VidHalluc, VideoHallucer, EventHallusion	VidHalluc Score78.7	17
Temporal Understanding	TempCompass, TVBench	TempCompass Score0.737	17
Conventional Video Understanding	VideoMMe, MVBench	VideoMMe Score53.4	17

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord