Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration

About

Zero-shot anomaly detection (ZSAD) identifies anomalies without needing training samples from the target dataset, essential for scenarios with privacy concerns or limited data. Vision-language models like CLIP show potential in ZSAD but have limitations: relying on manually crafted fixed textual descriptions or anomaly prompts is time-consuming and prone to semantic ambiguity, and CLIP struggles with pixel-level anomaly segmentation, focusing more on global semantics than local details. To address these limitations, We introduce KAnoCLIP, a novel ZSAD framework that leverages vision-language models. KAnoCLIP combines general knowledge from a Large Language Model (GPT-3.5) and fine-grained, image-specific knowledge from a Visual Question Answering system (Llama3) via Knowledge-Driven Prompt Learning (KnPL). KnPL uses a knowledge-driven (KD) loss function to create learnable anomaly prompts, removing the need for fixed text prompts and enhancing generalization. KAnoCLIP includes the CLIP visual encoder with V-V attention (CLIP-VV), Bi-Directional Cross-Attention for Multi-Level Cross-Modal Interaction (Bi-CMCI), and Conv-Adapter. These components preserve local visual semantics, improve local cross-modal fusion, and align global visual features with textual information, enhancing pixel-level anomaly detection. KAnoCLIP achieves state-of-the-art performance in ZSAD across 12 industrial and medical datasets, demonstrating superior generalization compared to existing methods.

Chengyuan Li, Suyang Zhou, Jieping Kong, Lei Qi, Hui Xue• 2025

Related benchmarks

TaskDatasetResultRank
Anomaly SegmentationRESC
AUC93.5
74
Anomaly ClassificationLiverCT
AUC78.2
72
Anomaly ClassificationRESC
AUC (%)84.8
68
Anomaly DetectionSDD
AUC0.862
57
Anomaly DetectionVisA--
52
Anomaly ClassificationBrainMRI--
47
Anomaly SegmentationLiverCT--
45
Anomaly SegmentationBTAD
Average Pixel AUROC96.5
41
Anomaly SegmentationBrainMRI--
39
Anomaly SegmentationMVTec AD
AUROC (Pixelwise)0.943
33
Showing 10 of 24 rows

Other info

Follow for update