Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well
About
Camouflaged Object Segmentation (COS) remains challenging because camouflaged objects exhibit only subtle visual differences from their backgrounds and single-modality RGB methods provide limited cues, leading researchers to explore multimodal data to improve segmentation accuracy. In this work, we presenet MultiCOS, a novel framework that effectively leverages diverse data modalities to improve segmentation performance. MultiCOS comprises two modules: Bi-space Fusion Segmentor (BFSer), which employs a state space and a latent space fusion mechanism to integrate cross-modal features within a shared representation and employs a fusion-feedback mechanism to refine context-specific features, and Cross-modal Knowledge Learner (CKLer), which leverages external multimodal datasets to generate pseudo-modal inputs and establish cross-modal semantic associations, transferring knowledge to COS models when real multimodal pairs are missing. When real multimodal COS data are unavailable, CKLer yields additional segmentation gains using only non-COS multimodal sources. Experiments on standard COS benchmarks show that BFSer outperforms existing multimodal baselines with both real and pseudo-modal data. Code will be released at \href{https://github.com/cnyvfang/MultiCOS}{GitHub}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camouflaged Object Detection | COD10K (test) | S-measure (S_alpha)0.88 | 224 | |
| Camouflaged Object Detection | Chameleon | S-measure (S_alpha)92.3 | 150 | |
| Camouflaged Object Detection | CAMO (test) | -- | 111 | |
| Camouflaged Object Detection | NC4K | M Score0.031 | 67 |