Training Binary Neural Networks using the Bayesian Learning Rule
About
Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. Surprisingly, ignoring the discrete nature of the problem and using gradient-based methods, such as the Straight-Through Estimator, still works well in practice. This raises the question: are there principled approaches which justify such methods? In this paper, we propose such an approach using the Bayesian learning rule. The rule, when applied to estimate a Bernoulli distribution over the binary weights, results in an algorithm which justifies some of the algorithmic choices made by the previous approaches. The algorithm not only obtains state-of-the-art performance, but also enables uncertainty estimation for continual learning to avoid catastrophic forgetting. Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 | Accuracy70.33 | 691 | |
| Image Classification | CIFAR-10 | Accuracy92.28 | 564 | |
| Image Classification | Tiny-ImageNet | Accuracy55.84 | 269 | |
| Image Classification | Imagenette | Accuracy79.59 | 36 | |
| Lifelong Object Recognition | OpenLORIS-Object (12-task stream) | Mean Accuracy89.37 | 24 | |
| Out-of-Distribution Detection | OpenLORIS-Object (held-out toy class) | Aleatoric AUC0.99 | 24 | |
| Online Continual Learning | Permuted MNIST 1000-tasks (last 5 tasks) | Mean Accuracy (5 Tasks)86.61 | 16 | |
| Out-of-Distribution Detection | MNIST vs Fashion-MNIST 1000-tasks Permuted | OOD Detection AUC80 | 15 | |
| Image Classification | Permuted-MNIST Single-task | Accuracy (1 Task)93.22 | 8 |