Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dataset Distillation via Factorization

About

In this paper, we study \xw{dataset distillation (DD)}, from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, \emph{HaBa} explores decomposing a dataset into two components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive constraints on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve \textasciitilde10\% higher accuracy than baseline methods in cross-architecture generalization. Our code is available \href{https://github.com/Huage001/DatasetFactorization}{here}.

Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationFashion MNIST (test)
Accuracy90.3
633
Image ClassificationCIFAR10 (test)
Accuracy48.3
585
Image ClassificationImageWoof (test)
Accuracy38.6
254
Image ClassificationCIFAR-100
Accuracy48.5
204
Image ClassificationMNIST (test)
Accuracy98.1
201
Image ClassificationImageNette (test)
Accuracy64.7
164
Image ClassificationCIFAR-10 (test)
Test Accuracy74
154
Image ClassificationCIFAR-100 (test)
Acc47
110
Image ClassificationImageNet I-Squawk (test)
Accuracy56.8
71
Image ClassificationImageNet I-Woof (test)
Accuracy38.6
36
Showing 10 of 18 rows

Other info

Follow for update