Variational Dropout Sparsifies Deep Neural Networks

About

We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.

Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov• 2017

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST (test)	--	894
Classification	controlled 784-feature benchmark (test)	Test Accuracy (%)92.25	40
Pruning	MNIST	Compression Ratio (CR)280	30
Model Pruning	CIFAR-10 (test)	Efficiency Index (EI)0.00e+0	11
Model Sparsification	MNIST LeNet-300-100 (test)	Test Error1.41	7
Image Classification	MNIST	Accuracy98.2	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord