Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reviving In-domain Fine-tuning Methods for Source-Free Cross-domain Few-shot Learning

About

Cross-Domain Few-Shot Learning (CDFSL) aims to adapt large-scale pretrained models to specialized target domains with limited samples, yet the few-shot fine-tuning of vision-language models like CLIP remains underexplored. By establishing multiple fine-tuning baselines of CLIP for CDFSL, we find adapter-based methods (e.g., LoRA) consistently outperform prompt-based ones (e.g., MaPLe), contrary to in-domain scenarios. To make those effective in-domain methods competitive again in CDFSL, we analyze this phenomenon and discover LoRA's superiority stems from rectifying the collapsed attention of visual CLS token, enhancing modality alignment and class separation by focusing on text-related visual regions. Further, we find textual EOS token exhibit much better attention to visual samples, and CLIP's standard contrastive loss weakly constrains modality alignment. Based on these insights, we propose Semantic Probe, a plug-and-play attention rectification framework for both adapter- and prompt-based methods. Extensive experiments on four CDFSL benchmarks validate our rationale, achieving state-of-the-art performance and benefiting both fine-tuning paradigms. Codes will be released.

Yaze Zhao, Yicong Liu, Yixiong Zou, Yuhua Li, Ruixuan Li• 2026

Related benchmarks

TaskDatasetResultRank
5-way 1-shot Few-Shot ClassificationBSCD-FSL Suite (ChestX, ISIC, EuroSAT, CropDisease, CUB, Cars, Places, Plantae) 1.0 (test)
ChestX Accuracy0.2365
28
Few-shot Image ClassificationBSCD-FSL ChestX, ISIC, EuroSAT, CropDiseases 5-way 1-shot
ChestX Accuracy23.65
17
5-way 5-shot ClassificationBSCD-FSL (test)
Accuracy (ChestX)25.79
17
Few-shot classificationBSCD-FSL 5-way 5-shot
Accuracy (ISIC, 5-way 5-shot)55.95
16
Showing 4 of 4 rows

Other info

Follow for update