Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

About

Segmentation Vision-Language Models (VLMs) have significantly advanced grounded visual understanding, yet they remain prone to pixel-grounding hallucinations, producing masks for incorrect objects or for objects that are entirely absent. Existing evaluations rely almost entirely on text- or label-based perturbations, which check only whether the predicted mask matches the queried label. Such evaluations overlook the spatial footprint and severity of hallucination and therefore fail to reveal vision-driven hallucinations, which are more challenging and more prevalent. To address this gap, we formalize the task of Counterfactual Segmentation Reasoning (CSR), where a model must segment the referenced object in the factual image and abstain in its counterfactual counterpart. To support this task, we curate HalluSegBench, the first large-scale benchmark to diagnose referring and reasoning expression segmentation hallucinations using controlled visual counterfactuals, alongside new evaluation metrics that measure hallucination severity and disentangle vision- and language-driven failure modes. We further introduce RobustSeg, a segmentation VLM trained with counterfactual fine-tuning (CFT) to learn when to segment and when to abstain. Experimental results confirm RobustSeg reduces hallucinations by 30%, while improving segmentation performance on FP-RefCOCO(+/g).

Xinzhuo Li, Adheesh Juvekar, Jiaxun Zhang, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Yifan Shen, Tianjiao Yu, Ismini Lourentzou• 2025

Related benchmarks

TaskDatasetResultRank
Referring SegmentationFP-RefCOCO
Segment Score59.57
9
Referring SegmentationRefCOCOg FP
Segment Score54.76
9
Reasoning SegmentationHALLUSEGBENCH Reasoning
CMS Factual0.1541
9
Referring SegmentationHALLUSEGBENCH Referring
CMS Factual10.62
9
LocalizationFP-RefCOCO
See Score83.37
6
LocalizationFP-RefCOCO+
See83
6
LocalizationFP-RefCOCOg
See84.21
6
SegmentationFP-RefCOCO+
Segmentation Score52.91
6
Showing 8 of 8 rows

Other info

Follow for update