Deep Probabilistic Supervision for Image Classification
About
Supervised training of deep neural networks for classification typically relies on hard targets, which promote overconfidence and can limit calibration, generalization, and robustness. Self-distillation methods aim to mitigate this by leveraging inter-class and sample-specific information present in the model's own predictions, but often remain dependent on hard targets without explicitly modeling predictive uncertainty. With this in mind, we propose Deep Probabilistic Supervision (DPS), a principled learning framework constructing sample-specific target distributions via statistical inference on the model's own predictions, remaining independent of hard targets after initialization. We show that DPS consistently yields higher test accuracy (e.g., +2.0% for DenseNet-264 on ImageNet) and significantly lower Expected Calibration Error (ECE) (-40% ResNet-50, CIFAR-100) than existing self-distillation methods. When combined with a contrastive loss, DPS achieves state-of-the-art robustness under label noise.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy89.54 | 3518 | |
| Image Classification | CIFAR-10 (test) | Accuracy98.44 | 3381 | |
| Image Classification | TinyImageNet (test) | Accuracy89.65 | 366 | |
| Image Classification | ImageNet (test) | Top-1 Accuracy79.88 | 291 | |
| Calibration | CIFAR-100 (test) | ECE0.85 | 99 | |
| Out-of-Distribution Detection | CIFAR-10 (ID) vs SVHN (OOD) (test) | AUROC98.03 | 79 | |
| Image Classification | CIFAR-10-C (test) | Accuracy (Clean)91.57 | 61 | |
| Image Classification | CIFAR-10 40% asymmetric noise (test) | Final Accuracy95.6 | 42 | |
| Out-of-Distribution Detection | CIFAR100 (ID) vs SVHN (OOD) (test) | AUROC90.71 | 40 | |
| Image Classification | CIFAR-10 Symmetry-50% noise (test) | Accuracy (Test)0.962 | 36 |