Recurrent Batch Normalization

About

We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

Tim Cooijmans, Nicolas Ballas, C\'esar Laurent, \c{C}a\u{g}lar G\"ul\c{c}ehre, Aaron Courville• 2016

Related benchmarks

Task	Dataset	Result
Character-level Language Modeling	text8 (test)	BPC1.36	128
Character-level Language Modeling	Penn Treebank (test)	BPC1.26	113
Pixel-by-pixel Image Classification	Permuted Sequential MNIST (pMNIST) (test)	Accuracy95.4	79
Sequential Image Classification	PMNIST (test)	Accuracy (Test)95.6	77
Sequential Image Classification	MNIST Sequential (test)	Accuracy99	51
Image Classification	pixel-by-pixel MNIST (test)	Accuracy99	28
Character-level Language Modeling	Penn Treebank char-level (test)	BPC1.32	25
Image Classification	Sequential MNIST	Accuracy99	16
Character-level Language Modeling	text8	BPC1.36	16
Character-level Language Modeling	Penn Treebank character-level (val)	BPC1.32	10

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord