AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
About
Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only rival existing techniques but can even outmatch them in many settings. Our proposed vision-only approach, AnomalyDINO, follows the well-established patch-level deep nearest neighbor paradigm, and enables both image-level anomaly prediction and pixel-level anomaly segmentation. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. Despite its simplicity, AnomalyDINO achieves state-of-the-art results in one- and few-shot anomaly detection (e.g., pushing the one-shot performance on MVTec-AD from an AUROC of 93.1% to 96.6%). The reduced overhead, coupled with its outstanding few-shot performance, makes AnomalyDINO a strong candidate for fast deployment, e.g., in industrial contexts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Anomaly Localization | MVTec AD | Pixel AUROC98.1 | 513 | |
| Anomaly Detection | MVTec-AD (test) | I-AUROC94.2 | 327 | |
| Anomaly Detection | VisA | AUROC92.6 | 261 | |
| Anomaly Detection | VisA (test) | -- | 91 | |
| Anomaly Detection | MVTec AD | Image-level AUROC97.7 | 52 | |
| Anomaly Detection | BraTS 2021 | -- | 50 | |
| Anomaly Detection | MVTec AD | I-AUROC96.8 | 43 | |
| Anomaly Detection | RESC | AUROC96.93 | 36 | |
| Anomaly Localization | VisA | -- | 35 | |
| Anomaly Localization | VisA | AUROC97.5 | 23 |