Variational Learning is Effective for Large Deep Networks

About

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective.

Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas M\"ollenhoff• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	FashionMNIST (test)	Accuracy91.3	461
Image Classification	CIFAR-100 (test)	Top-1 Accuracy67.4	429
Image Classification	CIFAR-100	Accuracy60.7	375
Out-of-Distribution Detection	CIFAR-10 vs SVHN (test)	AUROC0.836	146
Out-of-Distribution Detection	FashionMNIST (In-Distribution) vs EMNIST (Out-of-Distribution) (test)	AUROC0.82	46
Image Classification	SUN397	Accuracy77.27	40
OOD Detection	CIFAR-10 vs SVHN (test)	AUROC86	34
Image Classification	Fashion MNIST	Accuracy (ACC)91.1	16
OOD Detection	In: CIFAR-100, Out: TinyImageNet (test)	FPR@95%34.2	16
Image Classification	SVHN	Accuracy97.01	15

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord