Bin-wise Temperature Scaling (BTS): Improvement in Confidence Calibration Performance through Simple Scaling Techniques
About
The prediction reliability of neural networks is important in many applications. Specifically, in safety-critical domains, such as cancer prediction or autonomous driving, a reliable confidence of model's prediction is critical for the interpretation of the results. Modern deep neural networks have achieved a significant improvement in performance for many different image classification tasks. However, these networks tend to be poorly calibrated in terms of output confidence. Temperature scaling is an efficient post-processing-based calibration scheme and obtains well calibrated results. In this study, we leverage the concept of temperature scaling to build a sophisticated bin-wise scaling. Furthermore, we adopt augmentation of validation samples for elaborated scaling. The proposed methods consistently improve calibration performance with various datasets and deep convolutional neural network models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification Calibration | CIFAR100 | Classwise ECE0.0387 | 99 | |
| Calibration | Tabular datasets | NLL0.324 | 21 | |
| Image Classification Calibration | ImageNet | Accuracy78.99 | 15 | |
| Text Classification | IMDB binary sentiment (five random splits) | NLL0.301 | 11 | |
| Text Classification | Emotion multi-class (five random splits) | NLL0.156 | 9 | |
| Image Classification Calibration | BloodMNIST | NLL0.3305 | 9 |