WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection

About

Vision-language models have recently shown strong generalization in zero-shot anomaly detection (ZSAD), enabling the detection of unseen anomalies without task-specific supervision. However, existing approaches typically rely on fixed textual prompts, which struggle to capture complex semantics, and focus solely on spatial-domain features, limiting their ability to detect subtle anomalies. To address these challenges, we propose a wavelet-enhanced mixture-of-experts prompt learning method for ZSAD. Specifically, a variational autoencoder is employed to model global semantic representations and integrate them into prompts to enhance adaptability to diverse anomaly patterns. Wavelet decomposition extracts multi-frequency image features that dynamically refine textual embeddings through cross-modal interactions. Furthermore, a semantic-aware mixture-of-experts module is introduced to aggregate contextual information. Extensive experiments on 14 industrial and medical datasets demonstrate the effectiveness of the proposed method.

Peng Chen, Chao Huang• 2026

Related benchmarks

Task	Dataset	Result
Anomaly Localization	MVTec AD	Pixel AUROC92.1	543
Anomaly Detection	VisA	AUROC87.3	293
Anomaly Detection	MVTec AD	--	92
Anomaly Detection	Br35H	AUROC98.1	45
Anomaly Localization	VisA	PRO90.1	41
Anomaly Detection	BTAD	AUROC92.6	41
Pixel-level Anomaly Detection	ColonDB	AUROC84.3	39
Image-level Anomaly Detection	HeadCT	AUROC98.2	37
Image-level Anomaly Detection	DTD Synthetic	AUROC95	31
Anomaly Localization	BTAD	AUROC93.3	29

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord