CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
About
The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving complex multiple/non-target scenarios. Recent approaches address GRES by directly extending the well-adopted RES frameworks with object-existence identification. However, these approaches tend to encode multi-granularity object information into a single representation, which makes it difficult to precisely represent comprehensive objects of different granularity. Moreover, the simple binary object-existence identification across all referent scenarios fails to specify their inherent differences, incurring ambiguity in object understanding. To tackle the above issues, we propose a \textbf{Co}unting-Aware \textbf{H}ierarchical \textbf{D}ecoding framework (CoHD) for GRES. By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature. Furthermore, we incorporate the counting ability by embodying multiple/single/non-target scenarios into count- and category-level supervision, facilitating comprehensive object perception. Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks demonstrate the effectiveness and rationality of CoHD which outperforms state-of-the-art GRES methods by a remarkable margin. Code is available at \href{https://github.com/RobertLuo1/CoHD}{here}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Referring Expression Segmentation | RefCOCO (testA) | -- | 217 | |
| Referring Expression Segmentation | RefCOCO+ (val) | -- | 201 | |
| Referring Expression Segmentation | RefCOCO (testB) | -- | 191 | |
| Referring Expression Segmentation | RefCOCO (val) | -- | 190 | |
| Referring Expression Segmentation | RefCOCO+ (testA) | -- | 190 | |
| Referring Expression Segmentation | RefCOCO+ (testB) | -- | 188 | |
| Generalized Referring Expression Segmentation | gRefCOCO (testA) | cIoU71.85 | 115 | |
| Generalized Referring Expression Segmentation | gRefCOCO (val) | cIoU65.17 | 98 | |
| Generalized Referring Expression Segmentation | gRefCOCO (testB) | cIoU62.63 | 97 | |
| Referring Expression Segmentation | RefCOCOg (val (U)) | -- | 89 |