Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation

About

Existing LVLM-based reasoning segmentation methods often suffer from imprecise segmentation results and hallucinations in their text responses. This paper introduces POPEN, a novel framework designed to address these issues and achieve improved results. POPEN includes a preference-based optimization method to finetune the LVLM, aligning it more closely with human preferences and thereby generating better text responses and segmentation results. Additionally, POPEN introduces a preference-based ensemble method for inference, which integrates multiple outputs from the LVLM using a preference-score-based attention mechanism for refinement. To better adapt to the segmentation task, we incorporate several task-specific designs in our POPEN framework, including a new approach for collecting segmentation preference data with a curriculum learning mechanism, and a novel preference optimization loss to refine the segmentation capability of the LVLM. Experiments demonstrate that our method achieves state-of-the-art performance in reasoning segmentation, exhibiting minimal hallucination in text responses and the highest segmentation accuracy compared to previous advanced methods like LISA and PixelLM. Project page is https://lanyunzhu.site/POPEN/

Lanyun Zhu, Tianrun Chen, Qianxiong Xu, Xuanyi Liu, Deyi Ji, Haiyang Wu, De Wen Soh, Jun Liu• 2025

Related benchmarks

TaskDatasetResultRank
Referring Expression SegmentationRefCOCO (testA)
cIoU82
257
Referring Expression SegmentationRefCOCO+ (testA)
cIoU77
230
Referring Expression SegmentationRefCOCO+ (val)
cIoU73.1
223
Referring Expression SegmentationRefCOCO (testB)
cIoU74.1
213
Referring Expression SegmentationRefCOCO (val)
cIoU79.3
212
Referring Expression SegmentationRefCOCO+ (testB)
cIoU65.1
210
Reasoning SegmentationReasonSeg (test)
gIoU60.2
145
Referring Expression SegmentationRefCOCOg (val (U))
cIoU75.4
89
Referring Expression SegmentationRefCOCOg (test(U))
cIoU75.6
78
Hallucination EvaluationMMHal--
37
Showing 10 of 14 rows

Other info

Follow for update