Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

About

The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving complex multiple/non-target scenarios. Recent approaches address GRES by directly extending the well-adopted RES frameworks with object-existence identification. However, these approaches tend to encode multi-granularity object information into a single representation, which makes it difficult to precisely represent comprehensive objects of different granularity. Moreover, the simple binary object-existence identification across all referent scenarios fails to specify their inherent differences, incurring ambiguity in object understanding. To tackle the above issues, we propose a \textbf{Co}unting-Aware \textbf{H}ierarchical \textbf{D}ecoding framework (CoHD) for GRES. By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature. Furthermore, we incorporate the counting ability by embodying multiple/single/non-target scenarios into count- and category-level supervision, facilitating comprehensive object perception. Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks demonstrate the effectiveness and rationality of CoHD which outperforms state-of-the-art GRES methods by a remarkable margin. Code is available at \href{https://github.com/RobertLuo1/CoHD}{here}.

Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang• 2024

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO (val)--
259
Referring Expression SegmentationRefCOCO (testA)--
257
Referring Image SegmentationRefCOCO+ (test-B)--
252
Referring Expression SegmentationRefCOCO+ (testA)--
230
Referring Image SegmentationRefCOCO (test A)--
230
Referring Expression SegmentationRefCOCO+ (val)--
223
Referring Expression SegmentationRefCOCO (testB)--
213
Referring Expression SegmentationRefCOCO (val)--
212
Referring Expression SegmentationRefCOCO+ (testB)--
210
Referring Image SegmentationRefCOCO+ (val)--
179
Showing 10 of 25 rows

Other info

Follow for update