Mind the Discriminability Trap in Source-Free Cross-domain Few-shot Learning

About

Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) focuses on fine-tuning with limited training data from target domains (e.g., medical or satellite images), where Vision-Language Models (VLMs) such as CLIP and SigLIP have shown promising results. Current works in traditional visual models suggest that improving visual discriminability enhances performance. However, in VLM-based SF-CDFSL tasks, we find that \textbf{strengthening visual-modal discriminability actually suppresses VLMs' performance}. In this paper, we aim to delve into this phenomenon for an interpretation and a solution. By both theoretical and experimental proofs, our study reveals that fine-tuning with the typical cross-entropy loss ($\mathcal{L}_{\mathrm{vlm}}$) inherently includes a visual learning part and a cross-modal learning part, where the cross-modal part is crucial for rectifying the heavily disrupted modality misalignment in SF-CDFSL. However, we find that the visual learning essentially acts as a shortcut that encourages the model to reduce $\mathcal{L}_{\mathrm{vlm}}$ without considering the cross-modal part, therefore hindering the cross-modal alignment and harming the performance. Based on this interpretation, we further propose an approach to address this problem: first, we perturb the visual learning to guide the model to focus on the cross-modal alignment. Then, we use the visual-text semantic relationships to gradually align the visual and textual modalities during the fine-tuning. Extensive experiments on various settings, backbones (CLIP, SigLip, PE-Core), and tasks (4 CDFSL datasets and 11 FSL datasets) show that we consistently set new state-of-the-art results. Code is available at https://github.com/zhenyuZ-HUST/CVPR26-Mind-the-Discriminability-Trap.

Zhenyu Zhang, Yixiong Zou, Yuhua Li, Ruixuan Li, Guangyao Chen• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	EuroSAT	Top-1 Accuracy92.5	90
5-way 1-shot Classification	CD-FSL ISIC, EuroSAT, CropDisease, ChestX (test)	Accuracy (ISIC)45.01	86
5-way 5-shot Classification	CD-FSL ISIC, EuroSAT, CropDisease, ChestX (test)	Accuracy (ISIC)61.41	72
Image Classification	11-Dataset Average	Average Accuracy84	42

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord