VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization
About
We present a transformer-based image anomaly detection and localization network. Our proposed model is a combination of a reconstruction-based approach and patch embedding. The use of transformer networks helps to preserve the spatial information of the embedded patches, which are later processed by a Gaussian mixture density network to localize the anomalous areas. In addition, we also publish BTAD, a real-world industrial anomaly dataset. Our results are compared with other state-of-the-art algorithms using publicly available datasets like MNIST and MVTec.
Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, Gian Luca Foresti• 2021
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Anomaly Localization | MVTec-AD (test) | -- | 181 | |
| Anomaly Detection | BTAD | Average Image-level AUROC83.7 | 45 | |
| Anomaly Segmentation | BTAD | Average Pixel AUROC90 | 41 | |
| Anomaly Detection | BTAD (test) | Mean Pixel AUROC0.9 | 30 | |
| Anomaly Localization | BTAD | -- | 20 | |
| Anomaly Localization | BTAD (test) | Pixel AUROC (01)99 | 13 | |
| Anomaly Detection | BTAD | PR-AUC99 | 12 | |
| Anomaly Classification | MNIST | Class 0 AUC0.99 | 9 | |
| Anomaly Segmentation | BTAD Category 1 (test) | AUROC76.3 | 5 | |
| Anomaly Segmentation | BTAD Category 2 (test) | AUROC88.9 | 5 |
Showing 10 of 13 rows