Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

About

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Zhaoheng Zheng, Haidong Zhu, Ram Nevatia• 2023

Related benchmarks

TaskDatasetResultRank
Generalized Compositional Zero-Shot LearningC-GQA (test)
AUC0.099
46
Compositional Zero-Shot LearningUT-Zappos Closed World
HM57
42
Compositional Zero-Shot LearningC-GQA Closed World
HM32.7
41
Compositional Zero-Shot LearningUT-Zappos open world
HM49.4
38
Compositional Zero-Shot LearningMIT-States open world
HM21.6
38
Compositional Zero-Shot LearningC-GQA open world
HM Score11.5
35
Compositional Zero-Shot LearningVAW CZSL (test)
HM34.6
14
Compositional Zero-Shot LearningMIT-States Closed World (test)
AUC23.4
12
Showing 8 of 8 rows

Other info

Code

Follow for update