Variational Dropout Sparsifies Deep Neural Networks
About
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.
Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | MNIST (test) | -- | 894 | |
| Pruning | MNIST | Compression Ratio (CR)280 | 30 | |
| Model Pruning | CIFAR-10 (test) | Efficiency Index (EI)0.00e+0 | 11 | |
| Model Sparsification | MNIST LeNet-300-100 (test) | Test Error1.41 | 7 | |
| Image Classification | MNIST | Accuracy98.2 | 6 |
Showing 5 of 5 rows