Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Multi-label Image Classification with Transformers

About

Multi-label image classification is the task of predicting a set of labels corresponding to objects, attributes or other entities present in an image. In this work we propose the Classification Transformer (C-Tran), a general framework for multi-label image classification that leverages Transformers to exploit the complex dependencies among visual features and labels. Our approach consists of a Transformer encoder trained to predict a set of target labels given an input set of masked labels, and visual features from a convolutional neural network. A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels as positive, negative, or unknown during training. Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome. Moreover, because our model explicitly represents the uncertainty of labels during training, it is more general by allowing us to produce improved results for images with partial or extra label annotations during inference. We demonstrate this additional capability in the COCO, Visual Genome, News500, and CUB image datasets.

Jack Lanchantin, Tianlu Wang, Vicente Ordonez, Yanjun Qi• 2020

Related benchmarks

TaskDatasetResultRank
Multi-Label ClassificationPASCAL VOC 2007 (test)
mAP91.3
125
Multi-Label ClassificationMS-COCO 2014 (test)
mAP85.1
81
Pedestrian Attribute RecognitionPA-100K
mA81.53
79
Multi-Label ClassificationMS-COCO (val)
mAP85.1
47
Pedestrian Attribute RecognitionPA-100K (test)
mA81.53
40
Multi-Label ClassificationCOCO 2014 (test)
mAP66.3
31
Multi-label image recognitionMS-COCO (val)
CP86.3
23
Multi-label recognitionMS-COCO (val)
F1 Score (All)79.9
18
Multilabel Classificationmediamill (test)
Macro F1 Score54
15
Multi-Label ClassificationYeast (test)
Micro-F178.2
15
Showing 10 of 15 rows

Other info

Follow for update