Efficient compression of neural networks and datasets

About

Compression and generalization are fundamentally related through Solomonoff induction and the minimum description length principle (MDL), which predict that simpler models generalize better when data arises from low-complexity distributions. In this article, we combine insights from algorithmic information theory and techniques from neural network pruning to improve model generalization by identifying the most effective data compression method. Since exact MDL optimization is intractable, we cast it as $\ell_0$ regularized learning and explain why parameter sparsity provides an effective computable approximation of model description length. To identify the best practical approach, we systematically compare and refine complementary sparse optimization methods. In particular, we improve probabilistic pruning through a procedure that does not require Monte Carlo sampling and refine smooth $\ell_0$ approximations with a binary search routine that reduces hyperparameter complexity. Across convolutional networks and transformers evaluated on image and text datasets, our refined methods improve upon their predecessors, achieve substantial model compression with minimal accuracy loss, and yield short data description lengths. Finally, we use these methods in a controlled teacher-student setting to empirically verify the prediction of Solomonoff induction that compressed models learn more sample-efficiently and generalize better.

Lukas Silvester Barth, Paulo von Petersenn• 2025

Related benchmarks

Task	Dataset	Result
Pruning	MNIST	Compression Ratio (CR)342	30
Model Pruning	CIFAR-10 (test)	Efficiency Index (EI)3.88	11
Pruning	ImageNet	Effective Information (EI)1.5	11
Text Compression	Wiki-40B 300MB	Description Length (MB)61	8
Text Compression	Wiki-40B	Compressed Size (MB)214	8
Text Compression	Wiki-40B 6160MB	Description Length (MB)971	8

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord