Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient compression of neural networks and datasets

About

Compression and generalization are fundamentally related through Solomonoff induction and the minimum description length principle (MDL), which predict that simpler models generalize better when data arises from low-complexity distributions. In this article, we combine insights from algorithmic information theory and techniques from neural network pruning to improve model generalization by identifying the most effective data compression method. Since exact MDL optimization is intractable, we cast it as $\ell_0$ regularized learning and explain why parameter sparsity provides an effective computable approximation of model description length. To identify the best practical approach, we systematically compare and refine complementary sparse optimization methods. In particular, we improve probabilistic pruning through a procedure that does not require Monte Carlo sampling and refine smooth $\ell_0$ approximations with a binary search routine that reduces hyperparameter complexity. Across convolutional networks and transformers evaluated on image and text datasets, our refined methods improve upon their predecessors, achieve substantial model compression with minimal accuracy loss, and yield short data description lengths. Finally, we use these methods in a controlled teacher-student setting to empirically verify the prediction of Solomonoff induction that compressed models learn more sample-efficiently and generalize better.

Lukas Silvester Barth, Paulo von Petersenn• 2025

Related benchmarks

TaskDatasetResultRank
PruningMNIST
Compression Ratio (CR)342
30
Model PruningCIFAR-10 (test)
Efficiency Index (EI)3.88
11
PruningImageNet
Effective Information (EI)1.5
11
Text CompressionWiki-40B 300MB
Description Length (MB)61
8
Text CompressionWiki-40B
Compressed Size (MB)214
8
Text CompressionWiki-40B 6160MB
Description Length (MB)971
8
Showing 6 of 6 rows

Other info

Follow for update