COME: Test-time adaption by Conservatively Minimizing Entropy
About
Machine learning models must continuously self-adjust themselves for novel data distribution in the open world. As the predominant principle, entropy minimization (EM) has been proven to be a simple yet effective cornerstone in existing test-time adaption (TTA) methods. While unfortunately its fatal limitation (i.e., overconfidence) tends to result in model collapse. For this issue, we propose to Conservatively Minimize the Entropy (COME), which is a simple drop-in replacement of traditional EM to elegantly address the limitation. In essence, COME explicitly models the uncertainty by characterizing a Dirichlet prior distribution over model predictions during TTA. By doing so, COME naturally regularizes the model to favor conservative confidence on unreliable samples. Theoretically, we provide a preliminary analysis to reveal the ability of COME in enhancing the optimization stability by introducing a data-adaptive lower bound on the entropy. Empirically, our method achieves state-of-the-art performance on commonly used benchmarks, showing significant improvements in terms of classification accuracy and uncertainty estimation under various settings including standard, life-long and open-world TTA, i.e., up to $34.5\%$ improvement on accuracy and $15.1\%$ on false positive rate.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | CollegeMATH | Accuracy25.42 | 276 | |
| Mathematical Reasoning | AIME 24 | Accuracy6.67 | 154 | |
| Image Classification | ImageNet-C level 5 | Avg Top-1 Acc (ImageNet-C L5)58.5 | 110 | |
| Reasoning | GSM8K | -- | 106 | |
| Image Classification | ImageNet-C Severity 5 (test) | Mean Error Rate (Severity 5)43 | 104 | |
| Reasoning | MATH 500 | Accuracy (%)48.8 | 90 | |
| Mathematical Reasoning | Minerva | Accuracy (Acc)20.96 | 62 | |
| Medical Image Segmentation | REFUGE | Dice Score0.8304 | 49 | |
| Image Classification | ImageNet-C blind-spot subset level 5 | Accuracy (N)41.4 | 35 | |
| Medical Image Segmentation | BraTS-PED | Dice Score86.71 | 29 |