Reviving In-domain Fine-tuning Methods for Source-Free Cross-domain Few-shot Learning

About

Cross-Domain Few-Shot Learning (CDFSL) aims to adapt large-scale pretrained models to specialized target domains with limited samples, yet the few-shot fine-tuning of vision-language models like CLIP remains underexplored. By establishing multiple fine-tuning baselines of CLIP for CDFSL, we find adapter-based methods (e.g., LoRA) consistently outperform prompt-based ones (e.g., MaPLe), contrary to in-domain scenarios. To make those effective in-domain methods competitive again in CDFSL, we analyze this phenomenon and discover LoRA's superiority stems from rectifying the collapsed attention of visual CLS token, enhancing modality alignment and class separation by focusing on text-related visual regions. Further, we find textual EOS token exhibit much better attention to visual samples, and CLIP's standard contrastive loss weakly constrains modality alignment. Based on these insights, we propose Semantic Probe, a plug-and-play attention rectification framework for both adapter- and prompt-based methods. Extensive experiments on four CDFSL benchmarks validate our rationale, achieving state-of-the-art performance and benefiting both fine-tuning paradigms. Codes will be released.

Yaze Zhao, Yicong Liu, Yixiong Zou, Yuhua Li, Ruixuan Li• 2026

Related benchmarks

Task	Dataset	Result
5-way 1-shot Few-Shot Classification	BSCD-FSL Suite (ChestX, ISIC, EuroSAT, CropDisease, CUB, Cars, Places, Plantae) 1.0 (test)	ChestX Accuracy0.2365	28
Few-shot Image Classification	BSCD-FSL ChestX, ISIC, EuroSAT, CropDiseases 5-way 1-shot	ChestX Accuracy23.65	17
5-way 5-shot Classification	BSCD-FSL (test)	Accuracy (ChestX)25.79	17
Few-shot classification	BSCD-FSL 5-way 5-shot	Accuracy (ISIC, 5-way 5-shot)55.95	16

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord