Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Better Aggregation in Test-Time Augmentation

About

Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over existing approaches.

Divya Shanmugam, Davis Blalock, Guha Balakrishnan, John Guttag• 2020

Related benchmarks

TaskDatasetResultRank
Image Captioning EvaluationFlickr8K Expert (test)
Kendall tau_c51.9
76
Image Captioning EvaluationFlickr8K-CF (test)
Kendall tau_b34.7
65
Image Captioning EvaluationTHumb (test)
tau_c20.7
18
Brain Tumor SegmentationBraTS-MEN
Dice0.132
7
Brain Tumor SegmentationBraTS-PED
Dice Score86.95
7
Showing 5 of 5 rows

Other info

Follow for update