Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

About

This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy computation overhead. To address this problem efficiently, we propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of VLMs from a perspective of avoiding information interference. Specifically, we design a fully residual mechanism to infuse newly learned knowledge into a frozen backbone, while introducing minimal adverse impacts on pre-trained knowledge. Besides, this residual property enables our distribution-aware integration calibration scheme, explicitly controlling the information implantation process for test data from unseen distributions. Experiments demonstrate that our DIKI surpasses the current state-of-the-art approach using only 0.86% of the trained parameters and requiring substantially less training time. Code is available at: https://github.com/lloongx/DIKI .

Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia• 2024

Related benchmarks

Task	Dataset	Result
Multi-Task Incremental Learning	MTIL Order II	Average Acc74.5	92
Image Classification	MTIL task-agnostic (test)	Aircraft Accuracy45.4	36
Multi-domain Task-Incremental Learning	MTIL Order I (test)	Average Accuracy76.3	30
Multi-Task Incremental Learning	MTIL	Average Accuracy76.4	20
Image Classification	VDD	Average Accuracy65.9	20
Continual Learning	VDD	Figure of Merit (FoM)370.8	18
Continual Learning	MTIL	FoM6.4	18
Image Classification	MTIL Transfer (test)	Caltech10192.9	17
Multi-domain Task-Incremental Learning	MTIL Order I	Transfer Acc68.7	17
Multi-Domain Task-Incremental Learning (Transfer)	Multi-Domain Task-Incremental Learning Sequence (Aircraft, Caltech101, CIFAR100, DTD, Flowers, Food, StanfordCars, SUN397) 16-shot (test)	Caltech101 Accuracy95.6	16

Showing 10 of 43 rows

Other info

Follow for update

@wizwand_team Discord