Modality-Agnostic Prompt Learning for Multi-Modal Camouflaged Object Detection

About

Camouflaged Object Detection (COD) aims to segment objects that blend seamlessly into complex backgrounds, with growing interest in exploiting additional visual modalities to enhance robustness through complementary information. However, most existing approaches generally rely on modality-specific architectures or customized fusion strategies, which limit scalability and cross-modal generalization. To address this, we propose a novel framework that generates modality-agnostic multi-modal prompts for the Segment Anything Model (SAM), enabling parameter-efficient adaptation to arbitrary auxiliary modalities and significantly improving overall performance on COD tasks. Specifically, we model multi-modal learning through interactions between a data-driven content domain and a knowledge-driven prompt domain, distilling task-relevant cues into unified prompts for SAM decoding. We further introduce a lightweight Mask Refine Module to calibrate coarse predictions by incorporating fine-grained prompt cues, leading to more accurate camouflaged object boundaries. Extensive experiments on RGB-Depth, RGB-Thermal, and RGB-Polarization benchmarks validate the effectiveness and generalization of our modality-agnostic framework.

Hao Wang, Jiqing Zhang, Xin Yang, Baocai Yin, Lu Jiang, Zetian Mi, Huibing Wang• 2026

Related benchmarks

Task	Dataset	Result
Camouflaged Object Detection	COD10K	S-measure (S_alpha)0.901	217
Camouflaged Object Detection	Chameleon	S-measure (S_alpha)88.7	207
Camouflaged Object Detection	NC4K	M Score0.031	88
Camouflaged Object Detection	CAMO	Weighted F-beta (Fwβ)0.862	44
Salient Object Detection	VT821	S-Measure0.944	43
Camouflaged Object Detection	VIAC RGB-T	Sα93	12
Salient Object Detection	VT1000	S_alpha0.954	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord