DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

About

Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications. Recent work learns an alignment between textual and visual spaces to compensate for insufficient image labels, but loses accuracy because of the limited amount of available MLR annotations. In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods.

Ximeng Sun, Ping Hu, Kate Saenko• 2022

Related benchmarks

Task	Dataset	Result
Multi-Label Classification	NUS-WIDE (test)	mAP45	124
Multi-label recognition	MS-COCO	mAP83.1	87
Multi-Label Classification	VOC 07	mAP93.2	73
Multi-label recognition	NUS-WIDE	mAP43.6	66
Multi-Label Classification	COCO 2014	mAP78.7	55
Multi-Label Classification	VOC 2007	mAP (Average)93.2	52
Multi-label recognition	PASCAL VOC 2007 (test)	Avg. mAP93.2	44
Multi-Label Classification	NUS-WIDE 925/81 (unseen)	mAP (Mean Average Precision)43.6	43
Multi-label Image Classification	PASCAL VOC 2007	mAP90.3	40
Multi-Label Classification	NUS-WIDE	mAP48.65	40

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord