Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Recurrent Batch Normalization

About

We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

Tim Cooijmans, Nicolas Ballas, C\'esar Laurent, \c{C}a\u{g}lar G\"ul\c{c}ehre, Aaron Courville• 2016

Related benchmarks

TaskDatasetResultRank
Character-level Language Modelingtext8 (test)
BPC1.36
128
Character-level Language ModelingPenn Treebank (test)
BPC1.26
113
Pixel-by-pixel Image ClassificationPermuted Sequential MNIST (pMNIST) (test)
Accuracy95.4
79
Sequential Image ClassificationPMNIST (test)
Accuracy (Test)95.6
77
Sequential Image ClassificationMNIST Sequential (test)
Accuracy99
47
Image Classificationpixel-by-pixel MNIST (test)
Accuracy99
28
Character-level Language ModelingPenn Treebank char-level (test)
BPC1.32
25
Character-level Language Modelingtext8
BPC1.36
16
Image ClassificationSequential MNIST
Accuracy99
11
Character-level Language ModelingPenn Treebank character-level (val)
BPC1.32
10
Showing 10 of 13 rows

Other info

Follow for update