Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Res$^2$CLIP: Few-Shot Generalist Anomaly Detection with Residual-to-Residual Alignment

About

Few-shot Generalist Anomaly Detection requires models to generalize to novel categories without retraining, posing significant challenges in real-world scenarios with scarce samples and rapidly changing categories. Existing CLIP-based methods face two major challenges: coarse-grained unified text prompts struggle to adapt to fine-grained foreground-background differences, causing cross-granularity mismatch; and fine-tuning on auxiliary datasets disrupts CLIP's inherent open-world generalization due to domain shift, leading to cross-category generalization degradation. To address these, we propose to shift multimodal alignment entirely into a unified residual space, where residual representations naturally eliminate fine-grained normal feature differences across regions and class-specific biases, simultaneously resolving both problems. Based on this insight, Res$^2$CLIP, the first residual-to-residual alignment framework that symmetrically bridges visual and text modalities within CLIP's residual space, is designed. The framework is developed from a residual perspective into three branches: a text prompt-based branch, a visual prompt-based branch, and a novel residual-to-residual alignment branch. All learnable optimizations are constrained within the residual domain, and the residual alignment optimization objectives are designed to force the model to focus on relative anomaly deviations rather than optimizing class-specific features. Experiments on multiple datasets demonstrate the effectiveness of our architecture. The code is available at https://github.com/hito2448/Res2CLIP.

Xinyue Liu, Jianyuan Wang, Biao Leng, Shuo Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Anomaly LocalizationMVTec AD
Pixel AUROC97.4
534
Anomaly DetectionMVTec-AD (test)
I-AUROC97.9
348
Anomaly DetectionVisA (test)
I-AUROC91.1
148
Anomaly DetectionMPDD (test)
Image-level AU-ROC89
104
Anomaly LocalizationMPDD (test)
Pixel AUROC0.983
81
Anomaly DetectionBTAD (test)
Mean PRO0.835
43
Anomaly ClassificationMVTec AD
AUROC (Classification)97.9
27
Anomaly DetectionVisA 1-shot
F1-max86.1
23
Anomaly DetectionVisA 4-shot
F1-max87.5
23
Anomaly DetectionDTD-Synthetic 1 (test)
Instance AUC97.7
21
Showing 10 of 13 rows

Other info

Follow for update