VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization

About

We present a transformer-based image anomaly detection and localization network. Our proposed model is a combination of a reconstruction-based approach and patch embedding. The use of transformer networks helps to preserve the spatial information of the embedded patches, which are later processed by a Gaussian mixture density network to localize the anomalous areas. In addition, we also publish BTAD, a real-world industrial anomaly dataset. Our results are compared with other state-of-the-art algorithms using publicly available datasets like MNIST and MVTec.

Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, Gian Luca Foresti• 2021

Related benchmarks

Task	Dataset	Result
Anomaly Localization	MVTec-AD (test)	--	211
Anomaly Segmentation	BTAD	Average Pixel AUROC90	54
Anomaly Detection	BTAD	Average Image-level AUROC83.7	45
Anomaly Detection	BTAD (test)	--	43
Anomaly Localization	BTAD	--	29
Anomaly Localization	BTAD (test)	Avg Pixel AUROC90	24
Anomaly Detection	BTAD	PR-AUC99	12
Anomaly Classification	MNIST	Class 0 AUC0.99	9
Anomaly Segmentation	BTAD Category 1 (test)	AUROC76.3	5
Anomaly Segmentation	BTAD Category 2 (test)	AUROC88.9	5

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord