TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection

About

Anomaly detection identifies departures from expected behavior in safety-critical settings. When target-domain normal data are unavailable, zero-shot anomaly detection (ZSAD) leverages vision-language models (VLMs). However, CLIP's coarse image-text alignment limits both localization and detection due to (i) spatial misalignment and (ii) weak sensitivity to fine-grained anomalies; prior works compensate with complex auxiliary modules yet largely overlook the choice of backbone. We revisit the backbone and use TIPS-a VLM trained with spatially aware objectives. While TIPS alleviates CLIP's issues, it exposes a distributional gap between global and local features. We address this with decoupled prompts-fixed for image-level detection and learnable for pixel-level localization-and by injecting local evidence into the global score. Without CLIP-specific tricks, our TIPS-based pipeline improves image-level performance by 1.1-3.9% and pixel-level by 1.5-6.9% across seven industrial datasets, delivering strong generalization with a lean architecture. Code is available at github.com/AlirezaSalehy/Tipsomaly.

Alireza Salehi, Ehsan Karami, Sepehr Noey, Sahand Noey, Makoto Yamada, Reshad Hosseini, Mohammad Sabokrou• 2026

Related benchmarks

Task	Dataset	Result
Anomaly Detection	VisA	AUROC87.7	293
Anomaly Detection	MVTec	AUROC93.4	105
Anomaly Detection	DTD	AUROC99.4	55
Anomaly Detection	Br35H	AUROC93.2	45
Anomaly Detection	BTAD	AUROC95	41
Anomaly Detection	KSDD	AUROC0.978	40
Pixel-level Anomaly Detection	ColonDB	AUROC84.6	39
Image-level Anomaly Detection	HeadCT	AUROC92.7	37
Image-level Anomaly Detection	DAGM	AUROC99.7	33
Anomaly Localization	DTD	AUROC99.3	27

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord