ML-Decoder: Scalable and Versatile Classification Head
About
In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation. Public code is available at: https://github.com/Alibaba-MIIL/ML_Decoder
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | -- | 3518 | |
| Multi-Label Classification | PASCAL VOC 2007 (test) | mAP96.6 | 125 | |
| Multi-label image recognition | MS-COCO 2014 (val) | mAP91.1 | 51 | |
| Multi-Label Classification | NUS-WIDE 925/81 (unseen) | mAP (Mean Average Precision)31.1 | 43 | |
| Multi-Label Classification | NUS-WIDE | mAP33.7 | 38 | |
| Multi-Label Classification | COCO 2014 (test) | mAP66.9 | 31 | |
| Multi-Label Classification | MS-COCO (test) | mAP91.4 | 24 | |
| Multi-label Image Classification | MS-COCO (test) | mAP43.84 | 24 | |
| Multi-Label Classification | NUS-WIDE | mAP67.07 | 21 | |
| Multi-Label Classification | COCO originally multi-label (test val) | mAP91.1 | 15 |