BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning
About
Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable representations via multi-modal supervision, making them promising for CIL. However, applying CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities. To address these issues, we propose BOFA (Bridge-layer Orthogonal Fusion for Adaptation), a novel framework for CIL. BOFA confines all model adaptation exclusively to CLIP's existing cross-modal bridge-layer, thereby adding no extra parameters or inference cost. To prevent forgetting within this layer, it leverages Orthogonal Low-Rank Fusion, a mechanism that constrains parameter updates to a low-rank ``safe subspace" mathematically constructed to be orthogonal to past task features. This ensures stable knowledge accumulation without data replay. Furthermore, BOFA employs a cross-modal hybrid prototype that synergizes stable textual prototypes with visual counterparts derived from our stably adapted bridge-layer, enhancing classification performance. Extensive experiments on standard benchmarks show that BOFA achieves superior accuracy and efficiency compared to existing methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-incremental learning | CIFAR-100 | Average Accuracy86.07 | 150 | |
| Class-incremental learning | ImageNet-R | Last Accuracy79.12 | 147 | |
| Class-incremental learning | ImageNet-R B0 Inc20 | Last Accuracy79.78 | 98 | |
| Class-incremental learning | CIFAR-100 B0_Inc10 | Avg Accuracy86.07 | 60 | |
| Class-incremental learning | ObjectNet | Average Accuracy59.21 | 60 | |
| Class-incremental learning | CIFAR-100 B50Inc10 | Avg Accuracy0.8302 | 41 | |
| Class-incremental learning | FGVC Aircraft | Accuracy Last61.36 | 41 | |
| Class-incremental learning | CUB200 (100-20) | Avg Accuracy83.18 | 32 | |
| Class-incremental learning | ImageNet-R B0 Inc20 (test) | Average Performance (A-bar)84.53 | 23 | |
| Class-incremental learning | ImageNet-100 B50 Inc10 | Average Performance (A-bar)81.23 | 21 |