Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Better than Average: Spatially-Aware Aggregation of Segmentation Uncertainty Improves Downstream Performance

About

Uncertainty Quantification (UQ) is crucial for ensuring the reliability of automated image segmentations in safety-critical domains like biomedical image analysis or autonomous driving. In segmentation, UQ generates pixel-wise uncertainty scores that must be aggregated into image-level scores for downstream tasks like Out-of-Distribution (OoD) or failure detection. Despite routine use of aggregation strategies, their properties and impact on downstream task performance have not yet been comprehensively studied. Global Average is the default choice, yet it does not account for spatial and structural features of segmentation uncertainty. Alternatives like patch-, class- and threshold-based strategies exist, but lack systematic comparison, leading to inconsistent reporting and unclear best practices. We address this gap by (1) formally analyzing properties, limitations, and pitfalls of common strategies; (2) proposing novel strategies that incorporate spatial uncertainty structure and (3) benchmarking their performance on OoD and failure detection across ten datasets that vary in image geometry and structure. We find that aggregators leveraging spatial structure yield stronger performance in both downstream tasks studied. However, the performance of individual aggregators depends heavily on dataset characteristics, so we (4) propose a meta-aggregator that integrates multiple aggregators and performs robustly across datasets.

Vanessa Emanuela Guarino, Claudia Winklmayr, Jannik Franzen, Josef Lorenz Rumberger, Manuel Pfeuffer, Sonja Greven, Klaus Maier-Hein, Carsten T. L\"uth, Christoph Karg, Dagmar Kainmueller• 2026

Related benchmarks

TaskDatasetResultRank
Failure DetectionWORM-Nem
E-AURC13
48
Failure DetectionARC-Nuc
E-AURC0.04
48
Out-of-Distribution DetectionARC-Nuc
AUROC88
48
Failure DetectionLIDC-Mal
E-AURC0.07
48
Out-of-Distribution DetectionARC BC
AUROC72
48
Failure DetectionCAR-CS
E-AURC8
48
Out-of-Distribution DetectionLIDC-Tex
AUROC78
48
Out-of-Distribution DetectionLIDC-Mal
AUROC90
48
Failure DetectionARC BC
E-AURC7
48
Failure DetectionLIDC-Tex
E-AURC11
48
Showing 10 of 31 rows

Other info

Follow for update