Dataset Distillation via Committee Voting
About
Dataset distillation aims to synthesize a compact yet representative dataset that preserves the essential characteristics of the original data for efficient model training. Existing methods mainly focus on improving data-synthetic alignment or scaling distillation to large datasets. In this work, we propose $\textbf{C}$ommittee $\textbf{V}$oting for $\textbf{D}$ataset $\textbf{D}$istillation ($\textbf{CV-DD}$), an orthogonal approach that leverages the collective knowledge of multiple models to produce higher-quality distilled data. We first establish a strong baseline that achieves state-of-the-art performance through modern architectural and optimization choices. By integrating distributions and predictions from multiple models and generating high-quality soft labels, our method captures a broader range of data characteristics, reduces model-specific bias and the impact of distribution shifts, and significantly improves generalization. This voting-based strategy enhances diversity and robustness, alleviates overfitting, and improves post-evaluation performance. Extensive experiments across multiple datasets and IPC settings demonstrate that CV-DD consistently outperforms single- and multi-model distillation methods and generalizes well to non-training-based frameworks and challenging synthetic-to-real transfer tasks. Code is available at: https://github.com/Jiacheng8/CV-DD.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 | Top-1 Accuracy71.1 | 622 | |
| Image Classification | Tiny-ImageNet | Accuracy64.1 | 227 | |
| Image Classification | ImageNet-1K | Top-1 Accuracy65.3 | 137 | |
| Image Classification | CIFAR-10 | Top-1 Accuracy76.9 | 124 | |
| Image Classification | VisDA 2017 (Real) | Standard Accuracy20.7 | 7 |