Logits DeConfusion with CLIP for Few-Shot Learning

About

With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP's logits suffer from serious inter-class confusion problems in downstream tasks, and the ambiguity between categories seriously affects the accuracy. To address this challenge, we propose a novel method called Logits DeConfusion, which effectively learns and eliminates inter-class confusion in logits by combining our Multi-level Adapter Fusion (MAF) module with our Inter-Class Deconfusion (ICD) module. Our MAF extracts features from different levels and fuses them uniformly to enhance feature representation. Our ICD learnably eliminates inter-class confusion in logits with a residual structure. Experimental results show that our method can significantly improve the classification performance and alleviate the inter-class confusion problem. The code is available at https://github.com/LiShuo1001/LDC.

Shuo Li, Fang Liu, Zehua Hao, Xinyi Wang, Lingling Li, Xu Liu, Puhua Chen, Wenping Ma• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet V2	--	749
Image Classification	ImageNet	Top-1 Accuracy73.88	343
Image Classification	EuroSAT	Top-1 Accuracy78.4	90
5-way 1-shot Classification	CD-FSL ISIC, EuroSAT, CropDisease, ChestX (test)	Accuracy (ISIC)33.72	86
5-way 5-shot Classification	CD-FSL ISIC, EuroSAT, CropDisease, ChestX (test)	Accuracy (ISIC)49.7	72
Image Classification	ImageNet-Sketch	Accuracy48.85	63
Few-shot Image Classification	Average 11 datasets (test)	Average Accuracy (Few-shot)77.17	47
Image Classification	11-Dataset Average	Average Accuracy72.5	42
Few-shot Image Classification	CD-FSL 5-way 5-shot (test)	ChestX Accuracy25.89	38
Few-shot Image Classification	CD-FSL 5-way 1-shot (test)	ChestX Accuracy22.12	38

Showing 10 of 39 rows

Other info

Follow for update

@wizwand_team Discord