Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

About

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Zhaoheng Zheng, Haidong Zhu, Ram Nevatia• 2023

Related benchmarks

TaskDatasetResultRank
Instruction FollowingArena Hard
Win Rate58.9
263
Reward ModelingRewardBench
Chat Score93.6
216
Instruction FollowingAlpacaEval 2
LC (%)68.8
137
Compositional Zero-Shot LearningC-GQA open world
HM Score11.5
65
Compositional Zero-Shot LearningUT-Zappos Closed World
HM57
57
Compositional Zero-Shot LearningC-GQA Closed World
HM32.7
56
Compositional Zero-Shot LearningUT-Zappos open world
HM49.4
52
Generalized Compositional Zero-Shot LearningC-GQA (test)
AUC0.099
46
Reward ModelingRewardBench 2
Precise IF Score30.9
41
Compositional Zero-Shot LearningMIT-States open world
HM21.6
38
Showing 10 of 14 rows

Other info

Code

Follow for update