Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ELFS: Label-Free Coreset Selection with Proxy Training Dynamics

About

High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground truth labels over the whole dataset, failing to reduce the human labeling burden. Meanwhile, SOTA label-free coreset selection methods deliver inferior performance due to poor geometry-based difficulty scores. In this paper, we introduce ELFS (Effective Label-Free Coreset Selection), a novel label-free coreset selection method. ELFS significantly improves label-free coreset selection by addressing two challenges: 1) ELFS utilizes deep clustering to estimate training dynamics-based data difficulty scores without ground truth labels; 2) Pseudo-labels introduce a distribution shift in the data difficulty scores, and we propose a simple but effective double-end pruning method to mitigate bias on calculated scores. We evaluate ELFS on four vision benchmarks and show that, given the same vision encoder, ELFS consistently outperforms SOTA label-free baselines. For instance, when using SwAV as the encoder, ELFS outperforms D2 by up to 10.2% in accuracy on ImageNet-1K. We make our code publicly available on GitHub.

Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R. Bartoldson, Bhavya Kailkhura, Atul Prakash• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationSUN397 (test)
Top-1 Accuracy60.6
231
Image ClassificationFood-101 (test)
Accuracy77.2
145
Image ClassificationCIFAR-100-C 30% corrupted (test)
Accuracy73.1
45
Image ClassificationCIFAR-100-LT balanced imbalance factor 0.1 (test)
Accuracy56
45
Image ClassificationCIFAR-100 LT IF=0.01 (test)
Accuracy35
45
Image ClassificationTiny-ImageNet-C 30% corrupted (test)
Accuracy40.9
45
Image ClassificationCaltech-101 naturally imbalanced (test)
Accuracy75.7
45
Image ClassificationCIFAR-100 (test)
Accuracy (k=30)77.3
12
Image ClassificationImageNet 1k (test)
Accuracy (30% Threshold)73.5
9
Image ClassificationCIFAR-10 (test)
Accuracy (30%)95.3
9
Showing 10 of 10 rows

Other info

Follow for update