Rethinking Confidence Calibration for Failure Prediction

About

Reliable confidence estimation for the predictions is important in many safety-critical applications. However, modern deep neural networks are often overconfident for their incorrect predictions. Recently, many calibration methods have been proposed to alleviate the overconfidence problem. With calibrated confidence, a primary and practical purpose is to detect misclassification errors by filtering out low-confidence predictions (known as failure prediction). In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. We investigate this problem and reveal that popular confidence calibration methods often lead to worse confidence separation between correct and incorrect samples, making it more difficult to decide whether to trust a prediction or not. Finally, inspired by the natural connection between flat minima and confidence separation, we propose a simple hypothesis: flat minima is beneficial for failure prediction. We verify this hypothesis via extensive experiments and further boost the performance by combining two different flat minima techniques. Our code is available at https://github.com/Impression2805/FMFP

Fei Zhu, Zhen Cheng, Xu-Yao Zhang, Cheng-Lin Liu• 2023

Related benchmarks

Task	Dataset	Result
OOD Detection	CIFAR-100 standard (test)	AUROC (%)81.54	94
Out-of-Distribution Detection	CIFAR100	AURC284.1	39
Failure Detection	CIFAR100 vs. SVHN	AURC Score345.4	39
Failure Detection	CIFAR100 (test)	AURC69.83	39
Failure Prediction	CIFAR100-LT IF=10 (test)	Acc0.6912	28
Failure Prediction	CIFAR10-LT IF=10 (test)	Accuracy92.04	28
Out-of-Distribution Detection	CIFAR-10 (ID) vs 6 OOD datasets (Textures, SVHN, Place365, LSUN-C, LSUN-R, iSUN) (test)	FPR@9526.83	24
Failure Detection	CIFAR100 Old Setting	AURC22.58	5
Failure Detection	CIFAR100 New FD Setting	AURC255.9	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord