BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning

About

Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable representations via multi-modal supervision, making them promising for CIL. However, applying CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities. To address these issues, we propose BOFA (Bridge-layer Orthogonal Fusion for Adaptation), a novel framework for CIL. BOFA confines all model adaptation exclusively to CLIP's existing cross-modal bridge-layer, thereby adding no extra parameters or inference cost. To prevent forgetting within this layer, it leverages Orthogonal Low-Rank Fusion, a mechanism that constrains parameter updates to a low-rank ``safe subspace" mathematically constructed to be orthogonal to past task features. This ensures stable knowledge accumulation without data replay. Furthermore, BOFA employs a cross-modal hybrid prototype that synergizes stable textual prototypes with visual counterparts derived from our stably adapted bridge-layer, enhancing classification performance. Extensive experiments on standard benchmarks show that BOFA achieves superior accuracy and efficiency compared to existing methods.

Lan Li, Tao Hu, Da-Wei Zhou, Jia-Qi Yang, Han-Jia Ye, De-Chuan Zhan• 2025

Related benchmarks

Task	Dataset	Result
Class-incremental learning	CIFAR-100	Average Accuracy86.07	150
Class-incremental learning	ImageNet-R	Last Accuracy79.12	147
Class-incremental learning	ImageNet-R B0 Inc20	Last Accuracy79.78	107
Class-incremental learning	CIFAR-100 B0_Inc10	Avg Accuracy86.41	69
Class-incremental learning	ObjectNet	Average Accuracy59.21	60
Class-incremental learning	CIFAR-100 B50Inc10	Avg Accuracy0.8302	41
Class-incremental learning	FGVC Aircraft	Accuracy Last61.36	41
Class-incremental learning	CUB200 (100-20)	Avg Accuracy83.18	32
Class-incremental learning	ImageNet-R B0 Inc20 (test)	Average Performance (A-bar)84.53	23
Class-incremental learning	ImageNet-100 B50 Inc10	Average Performance (A-bar)81.23	21

Showing 10 of 44 rows

Other info

Follow for update

@wizwand_team Discord