On Calibrating Semantic Segmentation Models: Analyses and An Algorithm
About
We study the problem of semantic segmentation calibration. Lots of solutions have been proposed to approach model miscalibration of confidence in image classification. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration and show that selective scaling consistently outperforms other methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU53.16 | 2731 | |
| Semantic segmentation | Synthia to Cityscapes (test) | -- | 138 | |
| Semantic segmentation | BDD100K (val) | mIoU67.59 | 72 | |
| Semantic segmentation | COCOStuff 164k (val) | mIoU47.09 | 41 | |
| Semantic segmentation | SN-7-TS (test) | mIoU62.42 | 24 | |
| Semantic segmentation | ADE20K to COCO-164K (test) | mIoU9.6 | 12 | |
| Semantic segmentation | BDD100K to CityScapes (test) | mIoU67.46 | 5 | |
| Semantic segmentation | DAVIS (test) | mIoU89.33 | 5 | |
| Semantic segmentation | SN-7-SP (test) | mIoU59.42 | 5 | |
| Semantic segmentation | BRATS (test) | mIoU48.4 | 5 |