Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
About
Few-shot multi-class anomaly detection is crucial in real industrial settings, where only a few normal samples are available while numerous object types must be inspected. This setting is challenging as defect patterns vary widely across categories while normal samples remain scarce. Existing vision-language model-based approaches typically depend on class-specific anomaly descriptions or auxiliary modules, limiting both scalability and computational efficiency. In this work, we propose AnoPLe, a lightweight multimodal prompt learning framework that removes reliance on anomaly-type textual descriptions and avoids any external modules. AnoPLe employs bidirectional interactions between textual and visual prompts, allowing class semantics and instance-level cues to refine one another and form class-conditioned representations that capture shared normal patterns across categories. To enhance localization, we design a scale-aware prefix trained on both global and local views, enabling the prompts to capture both global context and fine-grained details. In addition, alignment loss propagates local anomaly evidence to global features, strengthening the consistency between pixel- and image-level predictions. Despite its simplicity, AnoPLe achieves strong performance on MVTec-AD, VisA, and Real-IAD under the few-shot multi-class setting, surpassing prior approaches while remaining efficient and free from expert-crafted anomaly descriptions. Moreover, AnoPLe generalizes well to unseen anomalies and extends effectively to the medical domain.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Anomaly Localization | MVTec AD | Pixel AUROC96.5 | 513 | |
| Anomaly Detection | MVTec AD | Image-level AUROC96.4 | 52 | |
| Anomaly Detection | VisA | -- | 52 | |
| Anomaly Localization | Real-IAD | P-AUROC97.4 | 43 | |
| Anomaly Localization | VisA | -- | 35 | |
| Anomaly Detection | Retina OCT | Image-level AUROC0.914 | 22 | |
| Anomaly Detection | Real-IAD | AUROC (Image-level)0.832 | 18 | |
| Anomaly Detection | BMAD Liver CT | I-AUC74.8 | 6 | |
| Anomaly Localization | BMAD Brain MRI | P-AUC97.1 | 6 | |
| Anomaly Localization | BMAD Retinal OCT | P-AUC97 | 6 |