Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-Entropy Loss Functions: Theoretical Analysis and Applications

About

Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We give the first $H$-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set $H$ used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, smooth adversarial comp-sum losses, that are derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.

Anqi Mao, Mehryar Mohri, Yutao Zhong• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet LT
Top-1 Accuracy41.6
251
Image ClassificationCIFAR-100 LT (IF=50)
Top-1 Acc43.9
25
Emotion ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Accuracy (B)0.613
19
Language ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Balanced Acc0.923
19
Gender ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Balanced Acc0.768
19
Age ClassificationCombined speech dataset (Baker, LJSpeech, ESD, CREMA-D, Genshin Impact) 1.0 (subject-independent)
Acc (B)0.235
19
Image ClassificationCIFAR-100 LT (IF=100)
Top-1 Acc38.43
13
Image ClassificationiNaturalist 2018
Top-1 Accuracy (Overall)61.7
12
Image ClassificationCIFAR-100-LT (IF=200)
Top-1 Acc34.84
9
Showing 9 of 9 rows

Other info

Follow for update