Evolving Prompt Adaptation for Vision-Language Models

About

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for stable, knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.

Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu, Yang Li• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	Flowers102	--	558
Image Classification	Food101	--	457
Image Classification	StanfordCars	--	384
Image Classification	OxfordPets	H Score96.68	182
Image Classification	Caltech101	Base Accuracy98.3	148
Image Classification	UCF101	Base Classes Acc87.5	139
Image Classification	ImageNet Domain Generalization (Source: ImageNet, Targets: ImageNetV2, ImageNet-Sketch, ImageNet-A, ImageNet-R) (test)	Accuracy (ImageNetV2)64.4	105
Image Classification	EuroSAT	Base Accuracy94.1	104
Image Classification	Average 11 datasets	Base Accuracy84.28	95
Image Classification	DTD	Base Accuracy83.1	52

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord