When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy
About
Standard cross-entropy is the default classification loss across virtually all of machine learning, yet it treats all misclassifications equally, ignoring the semantic distances that a class hierarchy encodes. We propose Hierarchy-Aware Cross-Entropy (HACE), a drop-in replacement for standard cross-entropy that incorporates a known class hierarchy directly into the loss. HACE combines two components: prediction aggregation, which propagates the model's probability mass upward through the class hierarchy to ensure that parent nodes accumulate the confidence of their children; and ancestral label smoothing, which distributes the ground-truth signal along the path from the true class to the root. We evaluate HACE on CIFAR-100, FGVC Aircraft, and NABirds in two regimes: end-to-end training across six architectures spanning convolutional and attention-based designs, and linear probing on frozen DINOv2-Large features. In end-to-end training, HACE improves accuracy over standard cross-entropy in 15 out of 18 architecture--dataset pairs, with a mean gain of 4.66\%. In linear probing on frozen DINOv2-Large features, HACE outperforms all competing methods on all three datasets, with a mean improvement of 2.18\% over the next best baseline.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | -- | 395 | |
| Image Classification | CIFAR-100 | -- | 357 | |
| Image Classification | FGVC | Accuracy80.62 | 111 | |
| Image Classification | NABirds | Accuracy84.96 | 63 | |
| Image Classification | FGVC | Top-5 Accuracy95.23 | 12 | |
| Image Classification | NABirds | Top-5 Accuracy87.67 | 12 | |
| Image Classification | NABirds (test) | Top-5 Accuracy97.62 | 10 |