Continual Learning with Vision-Language Models via Semantic-Geometry Preservation

About

Continual learning of pretrained vision-language models (VLMs) is prone to catastrophic forgetting, yet current approaches adapt to new tasks without explicitly preserving the cross-modal semantic geometry inherited from pretraining and previous stages, allowing new-task supervision to induce geometric distortion. We observe that the most pronounced drift tends to concentrate in vulnerable neighborhoods near the old-new semantic interface, where shared visual patterns are easily re-explained by new textual semantics. To address this under an exemplar-free constraint, we propose Semantic Geometry Preservation for Continual Learning (SeGP-CL). SeGP-CL first probes the drift-prone region by constructing a compact set of adversarial anchors with dual-targeted projected gradient descent (DPGD), which drives selected new-task seeds toward old-class semantics while remaining faithful in raw visual space. During training, we preserve cross-modal structure by anchor-guided cross-modal geometry distillation (ACGD), and stabilize the textual reference frame across tasks via a lightweight text semantic-geometry regularization (TSGR). After training, we estimate anchor-induced raw-space drift to transfer old visual prototypes and perform dual-path inference by fusing cross-modal and visual cues. Extensive experiments on five continual learning benchmarks demonstrate that SeGP-CL consistently improves stability and forward transfer, achieving state-of-the-art performance while better preserving semantic geometry of VLMs.

Chiyuan He, Zihuan Qiu, Fanman Meng, Runtong Zhang, Linfeng Xu, Qingbo Wu, Hongliang Li• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	Food101	Accuracy84.5	457
Class-incremental learning	CUB200 10 Tasks	FN (Final Acc)80.1	59
Class-incremental learning	ImageNet-R 10-task	--	54
Image Classification	ImageNet 1k (full)	Top-1 Acc67.6	53
Image Classification	Oxford Pets	Accuracy87.7	22
Class-incremental classification	CIFAR100 10 Tasks	Average Accuracy89.8	16
Class-incremental classification	ImageNet Sub 10 tasks	Average Accuracy89.9	16
Class-incremental classification	UCF101 10 Tasks	Average Accuracy95.9	9
Image Classification	CIFAR-Last	Accuracy84.6	8
Global Visual-Text Matching	CIFAR100 (test)	Forward Transfer (FWT)72.3	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord