Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Toward Real-World High-Precision Image Matting and Segmentation

About

High-precision scene parsing tasks, including image matting and dichotomous segmentation, aim to accurately predict masks with extremely fine details (such as hair). Most existing methods focus on salient, single foreground objects. While interactive methods allow for target adjustment, their class-agnostic design restricts generalization across different categories. Furthermore, the scarcity of high-quality annotation has led to a reliance on inharmonious synthetic data, resulting in poor generalization to real-world scenarios. To this end, we propose a Foreground Consistent Learning model, dubbed as FCLM, to address the aforementioned issues. Specifically, we first introduce a Depth-Aware Distillation strategy where we transfer the depth-related knowledge for better foreground representation. Considering the data dilemma, we term the processing of synthetic data as domain adaptation problem where we propose a domain-invariant learning strategy to focus on foreground learning. To support interactive prediction, we contribute an Object-Oriented Decoder that can receive both visual and language prompts to predict the referring target. Experimental results show that our method quantitatively and qualitatively outperforms SOTA methods.

Haipeng Zhou, Zhaohu Xing, Hongqiu Wang, Jun Ma, Ping Li, Lei Zhu• 2026

Related benchmarks

TaskDatasetResultRank
Dichotomous Image SegmentationDIS5K (DIS-VD)
S_alpha0.909
30
Dichotomous Image SegmentationDIS5K TE (1-4) (test)
Fw_beta89.5
25
Referring Image MattingRefMatte RW100 (test)
SAD21.31
13
Multi-object Image MattingHIM2K NATURAL (test)
IMQMSE83.48
5
Showing 4 of 4 rows

Other info

Follow for update