Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation

About

Referring Expression Segmentation (RES) has attracted rising attention, aiming to identify and segment objects based on natural language expressions. While substantial progress has been made in RES, the emergence of Generalized Referring Expression Segmentation (GRES) introduces new challenges by allowing expressions to describe multiple objects or lack specific object references. Existing RES methods, usually rely on sophisticated encoder-decoder and feature fusion modules, and are difficult to generate class prototypes that match each instance individually when confronted with the complex referent and binary labels of GRES. In this paper, reevaluating the differences between RES and GRES, we propose a novel Model with Adaptive Binding Prototypes (MABP) that adaptively binds queries to object features in the corresponding region. It enables different query vectors to match instances of different categories or different parts of the same instance, significantly expanding the decoder's flexibility, dispersing global pressure across all queries, and easing the demands on the encoder. Experimental results demonstrate that MABP significantly outperforms state-of-the-art methods in all three splits on gRefCOCO dataset. Meanwhile, MABP also surpasses state-of-the-art methods on RefCOCO+ and G-Ref datasets, and achieves very competitive results on RefCOCO. Code is available at https://github.com/buptLwz/MABP

Weize Li, Zhicheng Zhao, Haochen Bai, Fei Su• 2024

Related benchmarks

TaskDatasetResultRank
Referring Expression SegmentationRefCOCO (testA)
cIoU76.73
315
Referring Expression SegmentationRefCOCO+ (testA)
cIoU71.76
288
Referring Expression SegmentationRefCOCO+ (val)
cIoU65.99
272
Referring Expression SegmentationRefCOCO (val)
cIoU74.48
261
Referring Expression SegmentationRefCOCO (testB)
cIoU71.07
259
Referring Expression SegmentationRefCOCO+ (testB)
cIoU57.22
256
Referring Video Object SegmentationMeViS (val)
J&F Score0.434
166
Generalized Referring Expression SegmentationgRefCOCO (val)
cIoU65.72
165
Generalized Referring Expression SegmentationgRefCOCO (testA)
cIoU71.6
159
Generalized Referring Expression SegmentationgRefCOCO (testB)
cIoU62.76
141
Showing 10 of 15 rows

Other info

Code

Follow for update