Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

About

Recently, Multimodal Large Language Models (MLLMs) have made significant progress in the video comprehension field. Despite remarkable content reasoning and instruction following capabilities they demonstrated, the hallucination problem of these VideoLLMs is less explored compared with its counterpart in the image domain. To mitigate this gap, we propose EventHallusion, a novel benchmark that focuses on assessing the VideoLLMs' hallucination toward event, the crux of video analysis. From a hallucination attribution perspective, our EventHallusion benchmark is curated to assess a VideoLLM's susceptibility toward language priors and vision-language biases. On the other hand, we also propose a simple yet effective method, called Temporal Contrastive Decoding (TCD), to tackle the hallucination problems of VideoLLMs. The proposed TCD method rectifies the model's bias toward its priors during the decoding stage by comparing the original video with a modified version, in which temporal cues are disrupted. Through comprehensive evaluation of eight open-source and two closed-source VideoLLMs on the proposed EventHallusion benchmark, we observe that the open-source models suffer significantly from hallucination problems, whereas the closed-source ones perform markedly better. By further equipping open-source VideoLLMs with the proposed TCD approach, evident performance improvements are achieved across most metrics in the EventHallusion benchmark. Our codes and benchmark data are available at https://github.com/Stevetich/EventHallusion.

Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Na Zhao, Zhiyu Tan, Hao Li, Xingjun Ma, Jingjing Chen• 2024

Related benchmarks

TaskDatasetResultRank
Video Question AnsweringActivityNet-QA (test)
Accuracy53
275
Video Hallucination EvaluationVideoHallucer
ORH6
25
Temporal UnderstandingTempCompass, TVBench
TempCompass Score0.734
17
Hallucination ExaminationVidHalluc, VideoHallucer, EventHallusion
VidHalluc Score74.8
17
Conventional Video UnderstandingVideoMMe, MVBench
VideoMMe Score53
17
Hallucination EvaluationEventHallusion binary QA (test)
Accuracy0.649
15
Hallucination ExaminationEventHallusion
Average Score68.46
15
Video Understanding and ReasoningVideo-MME (test)
Overall Accuracy59.7
15
Hallucination EvaluationVRIPT-HAL (test)
F1 Score49.3
15
Video Understanding and ReasoningVideo-MMMU (test)
Overall Score0.466
15
Showing 10 of 14 rows

Other info

Follow for update