Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dataset Distillation via Committee Voting

About

Dataset distillation aims to synthesize a compact yet representative dataset that preserves the essential characteristics of the original data for efficient model training. Existing methods mainly focus on improving data-synthetic alignment or scaling distillation to large datasets. In this work, we propose $\textbf{C}$ommittee $\textbf{V}$oting for $\textbf{D}$ataset $\textbf{D}$istillation ($\textbf{CV-DD}$), an orthogonal approach that leverages the collective knowledge of multiple models to produce higher-quality distilled data. We first establish a strong baseline that achieves state-of-the-art performance through modern architectural and optimization choices. By integrating distributions and predictions from multiple models and generating high-quality soft labels, our method captures a broader range of data characteristics, reduces model-specific bias and the impact of distribution shifts, and significantly improves generalization. This voting-based strategy enhances diversity and robustness, alleviates overfitting, and improves post-evaluation performance. Extensive experiments across multiple datasets and IPC settings demonstrate that CV-DD consistently outperforms single- and multi-model distillation methods and generalizes well to non-training-based frameworks and challenging synthetic-to-real transfer tasks. Code is available at: https://github.com/Jiacheng8/CV-DD.

Jiacheng Cui, Zhaoyi Li, Xiaochen Ma, Xinyue Bi, Yaxin Luo, Zhiqiang Shen• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100
Top-1 Accuracy71.1
622
Image ClassificationTiny-ImageNet
Accuracy64.1
227
Image ClassificationImageNet-1K
Top-1 Accuracy65.3
137
Image ClassificationCIFAR-10
Top-1 Accuracy76.9
124
Image ClassificationVisDA 2017 (Real)
Standard Accuracy20.7
7
Showing 5 of 5 rows

Other info

Follow for update