Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

About

Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibility. To this end, we re-examine the existing dataset distillation methods and identify three properties required for large-scale real-world applications, namely, realism, diversity, and efficiency. As a remedy, we propose RDED, a novel computationally-efficient yet effective data distillation paradigm, to enable both diversity and realism of the distilled data. Extensive empirical results over various neural architectures and datasets demonstrate the advancement of RDED: we can distill the full ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes, achieving a notable 42% top-1 accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours).

Peng Sun, Bei Shi, Daiwei Yu, Tao Lin• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy62.6
3518
Image ClassificationCIFAR-10 (test)
Accuracy68.4
3381
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy58.61
2238
Image ClassificationImageNet-1K
Top-1 Acc65.4
1239
Image ClassificationImageNet 1k (test)
Top-1 Accuracy61.2
880
Image ClassificationCIFAR-10
Accuracy62.1
875
Image ClassificationTiny ImageNet (test)
Accuracy47.6
722
Image ClassificationCIFAR-100--
691
Image ClassificationImageNet-1K--
600
Image ClassificationImageNet 1k (test)
Top-1 Accuracy58.6
456
Showing 10 of 44 rows

Other info

Follow for update