Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ML-Decoder: Scalable and Versatile Classification Head

About

In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation. Public code is available at: https://github.com/Alibaba-MIIL/ML_Decoder

Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baruch, Asaf Noy• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)--
3518
Multi-Label ClassificationPASCAL VOC 2007 (test)
mAP96.6
125
Multi-label image recognitionMS-COCO 2014 (val)
mAP91.1
51
Multi-Label ClassificationNUS-WIDE 925/81 (unseen)
mAP (Mean Average Precision)31.1
43
Multi-Label ClassificationNUS-WIDE
mAP33.7
38
Multi-Label ClassificationCOCO 2014 (test)
mAP66.9
31
Multi-Label ClassificationMS-COCO (test)
mAP91.4
24
Multi-label Image ClassificationMS-COCO (test)
mAP43.84
24
Multi-Label ClassificationNUS-WIDE
mAP67.07
21
Multi-Label ClassificationCOCO originally multi-label (test val)
mAP91.1
15
Showing 10 of 23 rows

Other info

Code

Follow for update