Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection

About

Anomaly detection identifies departures from expected behavior in safety-critical settings. When target-domain normal data are unavailable, zero-shot anomaly detection (ZSAD) leverages vision-language models (VLMs). However, CLIP's coarse image-text alignment limits both localization and detection due to (i) spatial misalignment and (ii) weak sensitivity to fine-grained anomalies; prior works compensate with complex auxiliary modules yet largely overlook the choice of backbone. We revisit the backbone and use TIPS-a VLM trained with spatially aware objectives. While TIPS alleviates CLIP's issues, it exposes a distributional gap between global and local features. We address this with decoupled prompts-fixed for image-level detection and learnable for pixel-level localization-and by injecting local evidence into the global score. Without CLIP-specific tricks, our TIPS-based pipeline improves image-level performance by 1.1-3.9% and pixel-level by 1.5-6.9% across seven industrial datasets, delivering strong generalization with a lean architecture. Code is available at github.com/AlirezaSalehy/Tipsomaly.

Alireza Salehi, Ehsan Karami, Sepehr Noey, Sahand Noey, Makoto Yamada, Reshad Hosseini, Mohammad Sabokrou• 2026

Related benchmarks

TaskDatasetResultRank
Anomaly DetectionVisA
AUROC87.7
199
Anomaly DetectionMVTec
AUROC93.4
65
Anomaly DetectionKSDD
AUROC0.978
40
Image-level Anomaly DetectionDAGM
AUROC99.7
28
Anomaly DetectionDTD
AUROC99.4
28
Image-level Anomaly DetectionHeadCT
AUROC92.7
24
Anomaly LocalizationVisA
AUROC95.9
23
Anomaly DetectionBTAD
AUROC95
20
Anomaly LocalizationKSDD
AUROC99.5
19
Anomaly LocalizationDTD
AUROC99.3
19
Showing 10 of 18 rows

Other info

Follow for update