SSVP: Synergistic Semantic-Visual Prompting for Industrial Zero-Shot Anomaly Detection
About
Zero-Shot Anomaly Detection (ZSAD) leverages Vision-Language Models (VLMs) to enable supervision-free industrial inspection. However, existing ZSAD paradigms are constrained by single visual backbones, which struggle to balance global semantic generalization with fine-grained structural discriminability. To bridge this gap, we propose Synergistic Semantic-Visual Prompting (SSVP), that efficiently fuses diverse visual encodings to elevate model's fine-grained perception. Specifically, SSVP introduces the Hierarchical Semantic-Visual Synergy (HSVS) mechanism, which deeply integrates DINOv3's multi-scale structural priors into the CLIP semantic space. Subsequently, the Vision-Conditioned Prompt Generator (VCPG) employs cross-modal attention to guide dynamic prompt generation, enabling linguistic queries to precisely anchor to specific anomaly patterns. Furthermore, to address the discrepancy between global scoring and local evidence, the Visual-Text Anomaly Mapper (VTAM) establishes a dual-gated calibration paradigm. Extensive evaluations on seven industrial benchmarks validate the robustness of our method; SSVP achieves state-of-the-art performance with 93.0% Image-AUROC and 92.2% Pixel-AUROC on MVTec-AD, significantly outperforming existing zero-shot approaches.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-level Anomaly Detection | BTAD | AUROC94.2 | 39 | |
| Pixel-level Anomaly Detection | VisA | AUROC96.2 | 30 | |
| Image-level Anomaly Detection | DAGM | AUROC98 | 28 | |
| Image-level Anomaly Detection | MVTec AD | AUROC93 | 28 | |
| Image-level Anomaly Detection | VisA | AUC88.2 | 26 | |
| Anomaly Segmentation | RSDD | AUROC99.7 | 19 | |
| Image-level Anomaly Detection | DTD Synthetic | AUROC94 | 18 | |
| Pixel-level Anomaly Detection | MVTec AD | AUROC92.2 | 15 | |
| Image-level Anomaly Detection | RSDD | AUROC98.5 | 12 | |
| Image-level Anomaly Detection | KSDD2 | AUROC96.9 | 12 |