When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy

About

Standard cross-entropy is the default classification loss across virtually all of machine learning, yet it treats all misclassifications equally, ignoring the semantic distances that a class hierarchy encodes. We propose Hierarchy-Aware Cross-Entropy (HACE), a drop-in replacement for standard cross-entropy that incorporates a known class hierarchy directly into the loss. HACE combines two components: prediction aggregation, which propagates the model's probability mass upward through the class hierarchy to ensure that parent nodes accumulate the confidence of their children; and ancestral label smoothing, which distributes the ground-truth signal along the path from the true class to the root. We evaluate HACE on CIFAR-100, FGVC Aircraft, and NABirds in two regimes: end-to-end training across six architectures spanning convolutional and attention-based designs, and linear probing on frozen DINOv2-Large features. In end-to-end training, HACE improves accuracy over standard cross-entropy in 15 out of 18 architecture--dataset pairs, with a mean gain of 4.66\%. In linear probing on frozen DINOv2-Large features, HACE outperforms all competing methods on all three datasets, with a mean improvement of 2.18\% over the next best baseline.

April Chan, Davide D'Ascenzo, Sebastiano Cultrera di Montesano• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	429
Image Classification	CIFAR-100	--	375
Image Classification	FGVC	Accuracy80.62	140
Image Classification	NABirds	Accuracy84.96	63
Image Classification	FGVC	Top-5 Accuracy95.23	12
Image Classification	NABirds	Top-5 Accuracy87.67	12
Image Classification	NABirds (test)	Top-5 Accuracy97.62	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord