Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Continual Learning with Vision-Language Models via Semantic-Geometry Preservation

About

Continual learning of pretrained vision-language models (VLMs) is prone to catastrophic forgetting, yet current approaches adapt to new tasks without explicitly preserving the cross-modal semantic geometry inherited from pretraining and previous stages, allowing new-task supervision to induce geometric distortion. We observe that the most pronounced drift tends to concentrate in vulnerable neighborhoods near the old-new semantic interface, where shared visual patterns are easily re-explained by new textual semantics. To address this under an exemplar-free constraint, we propose Semantic Geometry Preservation for Continual Learning (SeGP-CL). SeGP-CL first probes the drift-prone region by constructing a compact set of adversarial anchors with dual-targeted projected gradient descent (DPGD), which drives selected new-task seeds toward old-class semantics while remaining faithful in raw visual space. During training, we preserve cross-modal structure by anchor-guided cross-modal geometry distillation (ACGD), and stabilize the textual reference frame across tasks via a lightweight text semantic-geometry regularization (TSGR). After training, we estimate anchor-induced raw-space drift to transfer old visual prototypes and perform dual-path inference by fusing cross-modal and visual cues. Extensive experiments on five continual learning benchmarks demonstrate that SeGP-CL consistently improves stability and forward transfer, achieving state-of-the-art performance while better preserving semantic geometry of VLMs.

Chiyuan He, Zihuan Qiu, Fanman Meng, Runtong Zhang, Linfeng Xu, Qingbo Wu, Hongliang Li• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationFood101
Accuracy84.5
457
Class-incremental learningCUB200 10 Tasks
FN (Final Acc)80.1
59
Class-incremental learningImageNet-R 10-task--
54
Image ClassificationImageNet 1k (full)
Top-1 Acc67.6
53
Class-incremental classificationCIFAR100 10 Tasks
Average Accuracy89.8
16
Class-incremental classificationImageNet Sub 10 tasks
Average Accuracy89.9
16
Image ClassificationOxford Pets
Accuracy87.7
15
Class-incremental classificationUCF101 10 Tasks
Average Accuracy95.9
9
Image ClassificationCIFAR-Last
Accuracy84.6
8
Global Visual-Text MatchingCIFAR100 (test)
Forward Transfer (FWT)72.3
5
Showing 10 of 11 rows

Other info

Follow for update