Logits DeConfusion with CLIP for Few-Shot Learning
About
With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP's logits suffer from serious inter-class confusion problems in downstream tasks, and the ambiguity between categories seriously affects the accuracy. To address this challenge, we propose a novel method called Logits DeConfusion, which effectively learns and eliminates inter-class confusion in logits by combining our Multi-level Adapter Fusion (MAF) module with our Inter-Class Deconfusion (ICD) module. Our MAF extracts features from different levels and fuses them uniformly to enhance feature representation. Our ICD learnably eliminates inter-class confusion in logits with a residual structure. Experimental results show that our method can significantly improve the classification performance and alleviate the inter-class confusion problem. The code is available at https://github.com/LiShuo1001/LDC.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet V2 | -- | 611 | |
| Image Classification | ImageNet | Top-1 Accuracy73.88 | 80 | |
| 5-way 1-shot Classification | CD-FSL ISIC, EuroSAT, CropDisease, ChestX (test) | Accuracy (ISIC)33.72 | 74 | |
| 5-way 5-shot Classification | CD-FSL ISIC, EuroSAT, CropDisease, ChestX (test) | Accuracy (ISIC)49.7 | 60 | |
| Few-shot Image Classification | Average 11 datasets (test) | Average Accuracy (Few-shot)77.17 | 47 | |
| Image Classification | 11-Dataset Average | Average Accuracy72.5 | 42 | |
| Few-shot Image Classification | CD-FSL 5-way 5-shot (test) | ChestX Accuracy25.89 | 38 | |
| Few-shot Image Classification | CD-FSL 5-way 1-shot (test) | ChestX Accuracy22.12 | 38 | |
| Image Classification | ImageNet-Sketch | Accuracy48.85 | 32 | |
| Tactile Recognition | Tactile Cross-Domain OF Real to X Unseen target domains | Average ACC52.2 | 22 |