Contrastive Augmented Transformer with Domain-specific Enhancement for Robust Multi-scenario Metal Surface Defect Detection
About
Metal surface defect detection is critical for maintaining product quality in industrial manufacturing. However, it faces significant challenges, including limited annotated data, difficulty in identifying subtle multi-scale defects, and poor generalization across diverse scenarios. To address these issues, this paper proposes a novel Contrastive Augmented Transformer (CAT) framework for robust defect detection. CAT employs a hierarchical Swin Transformer backbone and redesigns the feature pyramid network to effectively fuse low-level textures with high-level semantics, enabling precise modeling of subtle and multi-scale defect patterns. To enhance robustness under real-world noise conditions, we propose a domain-specific droplet augmentation algorithm. Furthermore, we incorporate a hard negative mining strategy into the contrastive loss to strengthen the model's discrimination ability in ambiguous defect regions. Experimental results on the KolektorSDD2 dataset demonstrate that CAT achieves a pixel-level AUROC of 99.54%, outperforming existing methods. In addition, CAT exhibits superior generalization and robustness on three unseen datasets, including KSDD1, MTD for tile defects, and MSDD for rail surface defects, demonstrating its potential for wide-scale industrial deployment.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Surface Defect Detection | Rail Surface Defect Detection (RSDD) | Pixel-level AUROC97.89 | 8 | |
| Surface Defect Detection | KolektorSDD1 | Image-level AUROC55.2 | 8 | |
| Surface Defect Detection | Magnetic Tile Defects (MTD) | Image-level AUROC56 | 8 | |
| Surface Defect Detection | KolektorSDD 2 (test) | Image-level AUROC93.8 | 8 |