HalLoc: Token-level Localization of Hallucinations for Vision Language Models

About

Hallucinations pose a significant challenge to the reliability of large vision-language models, making their detection essential for ensuring accuracy in critical applications. Current detection methods often rely on computationally intensive models, leading to high latency and resource demands. Their definitive outcomes also fail to account for real-world scenarios where the line between hallucinated and truthful information is unclear. To address these issues, we propose HalLoc, a dataset designed for efficient, probabilistic hallucination detection. It features 150K token-level annotated samples, including hallucination types, across Visual Question Answering (VQA), instruction-following, and image captioning tasks. This dataset facilitates the development of models that detect hallucinations with graded confidence, enabling more informed user interactions. Additionally, we introduce a baseline model trained on HalLoc, offering low-overhead, concurrent hallucination detection during generation. The model can be seamlessly integrated into existing VLMs, improving reliability while preserving efficiency. The prospect of a robust plug-and-play hallucination detection module opens new avenues for enhancing the trustworthiness of vision-language models in real-world applications. The HalLoc dataset and code are publicly available at: https://github.com/dbsltm/cvpr25_halloc.

Eunkyu Park, Minyeong Kim, Gunhee Kim• 2025

Related benchmarks

Task	Dataset	Result
Hallucination Evaluation	POPE	--	217
Token-level hallucination detection	MS COCO image captioning (test)	Precision71	27
Probability Calibration	HalLoc-Caption (test)	Object ECE0.04	12
Probability Calibration	HalLoc Instruct	Object ECE0.11	12
Probability Calibration	HalLoc VQA	Object ECE4.28	12
Object Hallucination Detection	MSCOCO Gemma 3 (test)	AUC79.27	8
Object Hallucination Detection	MSCOCO Qwen3-VL 3 (test)	AUC83.85	8
Object Hallucination Detection	MSCOCO Average performance across VLMs (test)	AUC81.17	8
Object Hallucination Detection	AMBER out-of-distribution (OOD)	AUC0.5	8
Object Hallucination Detection	MSCOCO LLaVA 1.5 (test)	AUC80.38	8

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord