Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Are We Overconfident in Models and Results for Semi-Supervised 3D Medical Image Segmentation?

About

Semi-supervised learning has become a dominant paradigm for reducing annotation costs. However, we argue that the current progress is clouded by a twofold overconfidence problem. Algorithmically, mainstream pseudo-labeling frameworks often conflate prediction confidence with uncertainty, leading to severe confirmation bias. Strategically, since multiple benchmark datasets lack dedicated validation sets, some studies use the test set for validation as well, leading to inflated performance estimates. Subsequent methods, compelled to employ the same strategy to surpass reported SOTA, trigger an arms race of overfitting. This raises concerns that the impressive numerical gains in the community may reflect overfitting rather than genuine progress. Thus, we propose a tri-space calibrated segmentation framework founded on a principled dual-axis reliability assessment engine. It explicitly decouples confidence from uncertainty and uses this signal to detect and correct confirmation bias across feature, probability, and image spaces in a collaborative manner. Across three benchmark datasets, TCSeg consistently delivers strong performance under existing evaluation protocols. More importantly, we advocate that the community report final-checkpoint results under multiple-run protocols, thereby establishing more rigorous benchmarks with a more realistic perspective. Code will be available: github.com/DirkLiii/TCSeg.

Jun Li, Ziwei Qin• 2026

Related benchmarks

TaskDatasetResultRank
Medical Image SegmentationPancreas-NIH
Dice Coefficient84.14
69
Medical Image SegmentationLA (10% labels)
Dice Score90.85
37
Medical Image SegmentationPancreas-CT (10% labeled data (6 samples))
Dice82.52
28
3D Medical Image SegmentationLA 20% labeled
DSC91.62
27
3D Medical Image SegmentationLA 8 labeled 72 unlabeled
DSC (%)90.85
27
Medical Image SegmentationBraTS 2019 (10% labeled data)
Dice Score86.52
27
3D Medical Image SegmentationLA 16 labeled / 64 unlabeled
DSC91.62
26
Medical Image SegmentationBraTS 2019 (20% labeled data)
Dice Coefficient86.68
26
3D Medical Image SegmentationPancreas-CT 20% labeled
DSC84.14
22
3D Medical Image SegmentationBraTS 2019 (25 Labeled 225 Unlabeled)
DSC86.52
11
Showing 10 of 12 rows

Other info

Follow for update