Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models

About

Pre-trained Vision-Language Models (VLMs) require Continual Learning (CL) to efficiently update their knowledge and adapt to various downstream tasks without retraining from scratch. However, for VLMs, in addition to the loss of knowledge previously learned from downstream tasks, pre-training knowledge is also corrupted during continual fine-tuning. This issue is exacerbated by the unavailability of original pre-training data, leaving VLM's generalization ability degrading. In this paper, we propose GIFT, a novel continual fine-tuning approach that utilizes synthetic data to overcome catastrophic forgetting in VLMs. Taking advantage of recent advances in text-to-image synthesis, we employ a pre-trained diffusion model to recreate both pre-training and learned downstream task data. In this way, the VLM can revisit previous knowledge through distillation on matching diffusion-generated images and corresponding text prompts. Leveraging the broad distribution and high alignment between synthetic image-text pairs in VLM's feature space, we propose a contrastive distillation loss along with an image-text alignment constraint. To further combat in-distribution overfitting and enhance distillation performance with limited amount of generated data, we incorporate adaptive weight consolidation, utilizing Fisher information from these synthetic image-text pairs and achieving a better stability-plasticity balance. Extensive experiments demonstrate that our method consistently outperforms previous state-of-the-art approaches across various settings.

Bin Wu, Wuxuan Shi, Jinqiao Wang, Mang Ye• 2025

Related benchmarks

TaskDatasetResultRank
Incremental LearningCIFAR100 10 steps
Final Step Performance77.7
39
Incremental LearningCIFAR100 50 steps
Last Accuracy71.29
36
Class-incremental learningCIFAR100 20 steps (test)
Last Accuracy73.73
21
Class-incremental learningTinyImageNet 5 steps 100 base classes (test)
Avg Score81.16
13
Class-incremental learningTinyImageNet 10 steps 100 base classes (test)
Avg Accuracy80.2
13
Class-incremental learningTinyImageNet 20 steps 100 base classes (test)
Average Accuracy79.32
13
Continual LearningMedXtreme (Order II)
Accuracy65.7
13
Continual LearningMedXtreme (Order I)
ACC66
13
Continual LearningHieraMedTransfer Order I
Transfer Performance53.4
13
Continual LearningHieraMedTransfer Order II
Transfer Score46.8
13
Showing 10 of 12 rows

Other info

Code

Follow for update