Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FB-CLIP: Fine-Grained Zero-Shot Anomaly Detection with Foreground-Background Disentanglement

About

Fine-grained anomaly detection is crucial in industrial and medical applications, but labeled anomalies are often scarce, making zero-shot detection challenging. While vision-language models like CLIP offer promising solutions, they struggle with foreground-background feature entanglement and coarse textual semantics. We propose FB-CLIP, a framework that enhances anomaly localization via multi-strategy textual representations and foreground-background separation. In the textual modality, it combines End-of-Text features, global-pooled representations, and attention-weighted token features for richer semantic cues. In the visual modality, multi-view soft separation along identity, semantic, and spatial dimensions, together with background suppression, reduces interference and improves discriminability. Semantic Consistency Regularization (SCR) aligns image features with normal and abnormal textual prototypes, suppressing uncertain matches and enlarging semantic gaps. Experiments show that FB-CLIP effectively distinguishes anomalies from complex backgrounds, achieving accurate fine-grained anomaly detection and localization under zero-shot settings.

Ming Hu, Yongsheng Huo, Mingyu Dou, Jianfu Yin, Peng Zhao, Yao Wang, Cong Hu, Bingliang Hu, Quan Wang• 2026

Related benchmarks

TaskDatasetResultRank
Anomaly DetectionVisA
AUROC89.5
261
Anomaly DetectionMVTec
AUROC92.4
79
Anomaly LocalizationMVTec
AUC91.9
78
Anomaly DetectionHead-CT
AUROC0.934
71
Anomaly DetectionDTD
AUROC97.9
55
Anomaly DetectionBr35H
AUROC97.1
45
Anomaly LocalizationReal-IAD
P-AUROC95.9
43
Anomaly DetectionBTAD
AUROC93.2
41
Pixel-level Anomaly DetectionColonDB
AUROC84.2
39
Anomaly DetectionMPDD
AUROC79.1
36
Showing 10 of 23 rows

Other info

Follow for update