WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection
About
Vision-language models have recently shown strong generalization in zero-shot anomaly detection (ZSAD), enabling the detection of unseen anomalies without task-specific supervision. However, existing approaches typically rely on fixed textual prompts, which struggle to capture complex semantics, and focus solely on spatial-domain features, limiting their ability to detect subtle anomalies. To address these challenges, we propose a wavelet-enhanced mixture-of-experts prompt learning method for ZSAD. Specifically, a variational autoencoder is employed to model global semantic representations and integrate them into prompts to enhance adaptability to diverse anomaly patterns. Wavelet decomposition extracts multi-frequency image features that dynamically refine textual embeddings through cross-modal interactions. Furthermore, a semantic-aware mixture-of-experts module is introduced to aggregate contextual information. Extensive experiments on 14 industrial and medical datasets demonstrate the effectiveness of the proposed method.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Anomaly Localization | MVTec AD | Pixel AUROC92.1 | 513 | |
| Anomaly Detection | VisA | AUROC87.3 | 261 | |
| Anomaly Detection | Br35H | AUROC98.1 | 45 | |
| Anomaly Detection | BTAD | AUROC92.6 | 41 | |
| Pixel-level Anomaly Detection | ColonDB | AUROC84.3 | 39 | |
| Image-level Anomaly Detection | HeadCT | AUROC98.2 | 37 | |
| Image-level Anomaly Detection | DTD Synthetic | AUROC95 | 31 | |
| Anomaly Localization | BTAD | AUROC93.3 | 29 | |
| Anomaly Detection | MVTec AD | -- | 29 | |
| Anomaly Localization | DAGM | AUROC99.5 | 26 |