Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COME: Test-time adaption by Conservatively Minimizing Entropy

About

Machine learning models must continuously self-adjust themselves for novel data distribution in the open world. As the predominant principle, entropy minimization (EM) has been proven to be a simple yet effective cornerstone in existing test-time adaption (TTA) methods. While unfortunately its fatal limitation (i.e., overconfidence) tends to result in model collapse. For this issue, we propose to Conservatively Minimize the Entropy (COME), which is a simple drop-in replacement of traditional EM to elegantly address the limitation. In essence, COME explicitly models the uncertainty by characterizing a Dirichlet prior distribution over model predictions during TTA. By doing so, COME naturally regularizes the model to favor conservative confidence on unreliable samples. Theoretically, we provide a preliminary analysis to reveal the ability of COME in enhancing the optimization stability by introducing a data-adaptive lower bound on the entropy. Empirically, our method achieves state-of-the-art performance on commonly used benchmarks, showing significant improvements in terms of classification accuracy and uncertainty estimation under various settings including standard, life-long and open-world TTA, i.e., up to $34.5\%$ improvement on accuracy and $15.1\%$ on false positive rate.

Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, Changqing Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningCollegeMATH
Accuracy25.42
276
Mathematical ReasoningAIME 24
Accuracy6.67
154
Image ClassificationImageNet-C level 5
Avg Top-1 Acc (ImageNet-C L5)58.5
110
ReasoningGSM8K--
106
Image ClassificationImageNet-C Severity 5 (test)
Mean Error Rate (Severity 5)43
104
ReasoningMATH 500
Accuracy (%)48.8
90
Mathematical ReasoningMinerva
Accuracy (Acc)20.96
62
Medical Image SegmentationREFUGE
Dice Score0.8304
49
Image ClassificationImageNet-C blind-spot subset level 5
Accuracy (N)41.4
35
Medical Image SegmentationBraTS-PED
Dice Score86.71
29
Showing 10 of 21 rows

Other info

Follow for update