Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise
About
Deep neural networks have demonstrated remarkable performance in various vision tasks, but their success heavily depends on the quality of the training data. Noisy labels are a critical issue in medical datasets and can significantly degrade model performance. Previous clean sample selection methods have not utilized the well pre-trained features of vision foundation models (VFMs) and assumed that training begins from scratch. In this paper, we propose CUFIT, a curriculum fine-tuning paradigm of VFMs for medical image classification under label noise. Our method is motivated by the fact that linear probing of VFMs is relatively unaffected by noisy samples, as it does not update the feature extractor of the VFM, thus robustly classifying the training samples. Subsequently, curriculum fine-tuning of two adapters is conducted, starting with clean sample selection from the linear probing phase. Our experimental results demonstrate that CUFIT outperforms previous methods across various medical image benchmarks. Specifically, our method surpasses previous baselines by 5.0%, 2.1%, 4.6%, and 5.8% at a 40% noise rate on the HAM10000, APTOS-2019, BloodMnist, and OrgancMnist datasets, respectively. Furthermore, we provide extensive analyses to demonstrate the impact of our method on noisy label detection. For instance, our method shows higher label precision and recall compared to previous approaches. Our work highlights the potential of leveraging VFMs in medical image classification under challenging conditions of noisy labels.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Skin lesion classification | HAM10000 (test) | Accuracy82.6 | 83 | |
| Medical Image Classification | MedMnist BloodMnist (test) | Accuracy99 | 65 | |
| Medical Image Classification | OrgancMnist MedMnist (test) | Test Accuracy93.7 | 35 | |
| Medical Image Classification | APTOS 2019 (test) | Test Accuracy84.2 | 35 | |
| Image Classification | CIFAR-100 80% symmetric noise (test) | Accuracy73.8 | 24 | |
| Medical Image Classification | FGADR Kaggle-EyePACS (test) | Accuracy53.7 | 7 | |
| Natural Image Classification | CIFAR10 Symmetric noise 80% (test) | Accuracy83.9 | 7 | |
| Natural Image Classification | ANIMAL10N Real-world noisy labels (test) | Accuracy92.3 | 7 | |
| Medical Image Classification | APTOS Kaggle-EyePACS 2019 (test) | Accuracy69.8 | 7 |