A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters

About

Incremental Learning (IL) aims to learn new tasks while preserving previously acquired knowledge. Integrating the zero-shot learning capabilities of pre-trained vision-language models into IL methods has marked a significant advancement. However, these methods face three primary challenges: (1) the need for improved training efficiency; (2) reliance on a memory bank to store previous data; and (3) the necessity of a strong backbone to augment the model's capabilities. In this paper, we propose SimE, a Simple and Efficient framework that employs a vision-language model with adapters designed specifically for the IL task. We report a remarkable phenomenon: there is a nonlinear correlation between the number of adaptive adapter connections and the model's IL capabilities. While increasing adapter connections between transformer blocks improves model performance, adding more adaptive connections within transformer blocks during smaller incremental steps does not enhance, and may even degrade the model's IL ability. Extensive experimental results show that SimE surpasses traditional methods by 9.6% on TinyImageNet and outperforms other CLIP-based methods by 5.3% on CIFAR-100. Furthermore, we conduct a systematic study to enhance the utilization of the zero-shot capabilities of CLIP. We suggest replacing SimE's encoder with a CLIP model trained on larger datasets (e.g., LAION2B) and stronger architectures (e.g., ViT-L/14).

Haihua Luo, Xuming Ran, Jiangrong Shen, Timo H\"am\"al\"ainen, Zhonghua Chen, Qi Xu, Fengyu Cong• 2026

Related benchmarks

Task	Dataset	Result
Class-incremental learning	CIFAR-100	Averaged Incremental Accuracy85.94	281
Class-incremental learning	ImageNet-100	Avg Acc79.35	82
Class-incremental learning	VTAB B0 Inc10	Last Accuracy78.04	54
Class-incremental learning	ImageNet-100 (10T)	Average Accuracy (A_T)89.77	35
Class-incremental learning	CIFAR100 50 steps (test)	Last Accuracy85.35	34
Class-incremental learning	CIFAR100 20 steps (test)	Last Accuracy86.64	33
Class-incremental learning	CUB200 (100-20)	Avg Accuracy84.98	32
Class-incremental learning	TinyImageNet 10 Steps (test)	--	15
Class-incremental learning	CIFAR100 10 Steps (test)	Average Accuracy91.66	12
Class-incremental learning	TinyImageNet 5 Steps (test)	Average Accuracy86.47	12

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord