POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
About
Existing LVLM-based reasoning segmentation methods often suffer from imprecise segmentation results and hallucinations in their text responses. This paper introduces POPEN, a novel framework designed to address these issues and achieve improved results. POPEN includes a preference-based optimization method to finetune the LVLM, aligning it more closely with human preferences and thereby generating better text responses and segmentation results. Additionally, POPEN introduces a preference-based ensemble method for inference, which integrates multiple outputs from the LVLM using a preference-score-based attention mechanism for refinement. To better adapt to the segmentation task, we incorporate several task-specific designs in our POPEN framework, including a new approach for collecting segmentation preference data with a curriculum learning mechanism, and a novel preference optimization loss to refine the segmentation capability of the LVLM. Experiments demonstrate that our method achieves state-of-the-art performance in reasoning segmentation, exhibiting minimal hallucination in text responses and the highest segmentation accuracy compared to previous advanced methods like LISA and PixelLM. Project page is https://lanyunzhu.site/POPEN/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Referring Expression Segmentation | RefCOCO (testA) | cIoU82 | 217 | |
| Referring Expression Segmentation | RefCOCO+ (val) | cIoU73.1 | 201 | |
| Referring Expression Segmentation | RefCOCO (testB) | cIoU74.1 | 191 | |
| Referring Expression Segmentation | RefCOCO+ (testA) | cIoU77 | 190 | |
| Referring Expression Segmentation | RefCOCO (val) | cIoU79.3 | 190 | |
| Referring Expression Segmentation | RefCOCO+ (testB) | cIoU65.1 | 188 | |
| Reasoning Segmentation | ReasonSeg (test) | gIoU60.2 | 102 | |
| Referring Expression Segmentation | RefCOCOg (val (U)) | cIoU75.4 | 89 | |
| Referring Expression Segmentation | RefCOCOg (test(U)) | cIoU75.6 | 78 | |
| Reasoning Segmentation | MUSE (val) | gIoU (overall)48 | 21 |