Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evolving Prompt Adaptation for Vision-Language Models

About

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for stable, knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.

Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu, Yang Li• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationFlowers102--
558
Image ClassificationFood101--
457
Image ClassificationStanfordCars--
312
Image ClassificationCaltech101
Base Accuracy98.3
148
Image ClassificationOxfordPets
Base Accuracy95.3
137
Image ClassificationEuroSAT
Base Accuracy94.1
104
Image ClassificationUCF101
Base Classes Acc87.5
100
Image ClassificationImageNet Domain Generalization (Source: ImageNet, Targets: ImageNetV2, ImageNet-Sketch, ImageNet-A, ImageNet-R) (test)
Accuracy (ImageNetV2)64.4
84
Image ClassificationAverage 11 datasets
Base Accuracy84.28
83
Image ClassificationDTD
Base Accuracy83.1
40
Showing 10 of 14 rows

Other info

Follow for update