CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

About

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Zhaoheng Zheng, Haidong Zhu, Ram Nevatia• 2023

Related benchmarks

Task	Dataset	Result
Instruction Following	Arena Hard	Win Rate58.9	263
Reward Modeling	RewardBench	Chat Score93.6	216
Instruction Following	AlpacaEval 2	LC (%)68.8	137
Compositional Zero-Shot Learning	C-GQA open world	HM Score11.5	65
Compositional Zero-Shot Learning	UT-Zappos Closed World	HM57	57
Compositional Zero-Shot Learning	C-GQA Closed World	HM32.7	56
Compositional Zero-Shot Learning	UT-Zappos open world	HM49.4	52
Generalized Compositional Zero-Shot Learning	C-GQA (test)	AUC0.099	46
Reward Modeling	RewardBench 2	Precise IF Score30.9	41
Compositional Zero-Shot Learning	MIT-States open world	HM21.6	38

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord