High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy
About
High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics and less robust spatial priors; diffusion methods, using strong generative priors, have high accuracy but encounter high computational burdens. As a solution, we find pseudo depth information from monocular depth estimation models can provide essential semantic understanding that quickly reveals spatial differences across target objects and backgrounds. Inspired by this phenomenon, we discover a novel insight we term the depth integrity-prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns. To exploit such a prior, we propose a Prior of Depth Fusion Network (PDFNet). Specifically, our network establishes multimodal interactive modeling to achieve depth-guided structural perception by deeply fusing RGB and pseudo depth features. We further introduce a novel depth integrity-prior loss to explicitly enforce depth consistency in segmentation results. Additionally, we design a fine-grained perception enhancement module with adaptive patch selection to perform boundary-sensitive detail refinement. Notably, PDFNet achieves state-of-the-art performance with only 94M parameters (<11% of those diffusion-based models), outperforming all non-diffusion methods and surpassing some diffusion methods. Code is provided in the supplementary materials.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Salient Object Detection | HRSOD 400 (test) | Fw-beta Score0.943 | 15 | |
| Dichotomous Image Segmentation | DIS 470 (val) | Fmax0.915 | 14 | |
| Dichotomous Image Segmentation | DIS TE1 500 (test) | Fmax89.1 | 14 | |
| Dichotomous Image Segmentation | DIS-TE2 500 (test) | Fmax92 | 14 | |
| Dichotomous Image Segmentation | DIS-TE3 500 (test) | Fmax93.6 | 14 | |
| Dichotomous Image Segmentation | DIS ALL 2,000 (test) | Fmax91.5 | 14 | |
| Dichotomous Image Segmentation | DIS TE4 500 (test) | Fmax91.2 | 14 | |
| Dense Instance Segmentation | DIS VD (test) | Fmax91.5 | 4 | |
| High-Resolution Salient Object Detection | UHRSD 988 samples (test) | Fmax96.3 | 4 |