LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation

About

We present LawDIS, a language-window-based controllable dichotomous image segmentation (DIS) framework that produces high-quality object masks. Our framework recasts DIS as an image-conditioned mask generation task within a latent diffusion model, enabling seamless integration of user controls. LawDIS is enhanced with macro-to-micro control modes. Specifically, in macro mode, we introduce a language-controlled segmentation strategy (LS) to generate an initial mask based on user-provided language prompts. In micro mode, a window-controlled refinement strategy (WR) allows flexible refinement of user-defined regions (i.e., size-adjustable windows) within the initial mask. Coordinated by a mode switcher, these modes can operate independently or jointly, making the framework well-suited for high-accuracy, personalised applications. Extensive experiments on the DIS5K benchmark reveal that our LawDIS significantly outperforms 11 cutting-edge methods across all metrics. Notably, compared to the second-best model MVANet, we achieve $F_\beta^\omega$ gains of 4.6\% with both the LS and WR strategies and 3.6\% gains with only the LS strategy on DIS-TE. Codes will be made available at https://github.com/XinyuYanTJU/LawDIS.

Xinyu Yan, Meijun Sun, Ge-Peng Ji, Fahad Shahbaz Khan, Salman Khan, Deng-Ping Fan• 2025

Related benchmarks

Task	Dataset	Result
Dichotomous Image Segmentation	DIS5K DIS-TE1 (test)	Fmax89.9	24
Dichotomous Image Segmentation	DIS5K DIS-TE4 (test)	Fmax0.922	24
Dichotomous Image Segmentation	DIS5K DIS-TE Overall (test)	Fmax Score0.918	24
Dichotomous Image Segmentation	DIS5K DIS-TE2 (test)	Fmax92.1	24
Dichotomous Image Segmentation	DIS5K DIS-TE3 (test)	Fmax0.929	24
Dichotomous Image Segmentation	DIS5K DIS-VD (val)	Weighted F-measure88.4	12

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord