Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Query2Label: A Simple Transformer Way to Multi-Label Classification

About

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE, and Visual Genome. Particularly, we establish $91.3\%$ mAP on MS-COCO. We hope its compact structure, simple implementation, and superior performance serve as a strong baseline for multi-label classification tasks and future studies. The code will be available soon at https://github.com/SlongLiu/query2labels.

Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu• 2021

Related benchmarks

TaskDatasetResultRank
Multi-Label ClassificationPASCAL VOC 2007 (test)
mAP97.3
125
Multi-Label ClassificationNUS-WIDE (test)
mAP70.1
112
Multi-Label ClassificationMS-COCO 2014 (test)
mAP91.3
81
Pedestrian Attribute RecognitionPA-100K
mA80.72
79
Multi-label Image ClassificationVOC 2012 (test)
mAP96.6
72
Multi-label image recognitionMS-COCO 2014 (val)
mAP90.3
51
Multi-Label ClassificationMS-COCO (val)
mAP90.5
47
Pedestrian Attribute RecognitionPA-100K (test)
mA80.72
40
Multi-label recognitionPASCAL VOC 2007 (test)
Avg. mAP96
25
Multi-Label ClassificationMS-COCO (test)
mAP89.2
24
Showing 10 of 19 rows

Other info

Code

Follow for update