Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

About

Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) under the low-data regime, where only a few additional parameters are introduced to excavate the task-specific knowledge based on the general and powerful representation of VLMs. However, most adapter-style works face two limitations: (i) modeling task-specific knowledge with a single modality only; and (ii) overlooking the exploitation of the inter-class relationships in downstream tasks, thereby leading to sub-optimal solutions. To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively. This enables the textual feature of each prompt to leverage the task-specific structure knowledge from both textual and visual modalities, yielding a more effective classifier for downstream tasks. Extensive experimental results on 11 benchmark datasets reveal that our GraphAdapter significantly outperforms previous adapter-based methods. The code will be released at https://github.com/lixinustc/GraphAdapter

Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, Xinchao Wang• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT
Accuracy85.27
569
Image ClassificationFlowers102
Accuracy96.23
558
Image ClassificationDTD
Accuracy67.57
542
Image ClassificationFood-101
Accuracy78.63
542
Image ClassificationUCF101
Top-1 Acc78.8
455
Image ClassificationSUN397
Accuracy71.2
425
Image ClassificationImageNet
Top-1 Accuracy73.68
366
Image ClassificationStanfordCars
Accuracy76.23
312
Image ClassificationOxford-IIIT Pets
Accuracy88.57
306
Image ClassificationFGVCAircraft
Accuracy36.87
261
Showing 10 of 14 rows

Other info

Code

Follow for update