Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

About

Zero-shot anomaly detection (ZSAD) requires detecting and localizing anomalies without access to target-class anomaly samples. Mainstream methods rely on vision-language models (VLMs) such as CLIP: they build hand-crafted or learned prompt sets for normal and abnormal semantics, then compute image-text similarities for open-set discrimination. While effective, this paradigm depends on a text encoder and cross-modal alignment, which can lead to training instability and parameter redundancy. This work revisits the necessity of the text branch in ZSAD and presents VisualAD, a purely visual framework built on Vision Transformers. We introduce two learnable tokens within a frozen backbone to directly encode normality and abnormality. Through multi-layer self-attention, these tokens interact with patch tokens, gradually acquiring high-level notions of normality and anomaly while guiding patches to highlight anomaly-related cues. Additionally, we incorporate a Spatial-Aware Cross-Attention (SCA) module and a lightweight Self-Alignment Function (SAF): SCA injects fine-grained spatial information into the tokens, and SAF recalibrates patch features before anomaly scoring. VisualAD achieves state-of-the-art performance on 13 zero-shot anomaly detection benchmarks spanning industrial and medical domains, and adapts seamlessly to pretrained vision backbones such as the CLIP image encoder and DINOv2. Code: https://github.com/7HHHHH/VisualAD

Yanning Hou, Peiyuan Li, Zirui Liu, Yitong Wang, Yanran Ruan, Jianfeng Qiu, Ke Xu• 2026

Related benchmarks

TaskDatasetResultRank
Anomaly SegmentationMVTec AD--
105
Image-level Anomaly DetectionMVTec AD
AUROC92.2
82
Image-level Anomaly DetectionVisA
AUC84.7
80
Image-level Anomaly DetectionBTAD
AUROC94.9
54
Anomaly Segmentation (Pixel-level)Brain AD
AUROC96.4
10
Pixel-level Anomaly LocalizationVisA 42 (joint evaluation protocol)
AUROC95.8
8
Pixel-level Anomaly LocalizationMVTec-AD 41 (joint evaluation protocol)
AUROC91.3
8
Pixel-level Anomaly LocalizationBTAD 43 (joint evaluation protocol)
AUROC93.4
8
Anomaly DetectionBrain AD
AUROC87.1
7
Anomaly Detection (Image-level)OCT 17
AUROC91.2
3
Showing 10 of 10 rows

Other info

Follow for update