Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Improving the Gating Mechanism of Recurrent Neural Networks

About

Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved.

Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu• 2019

Related benchmarks

TaskDatasetResultRank
Sentiment AnalysisIMDB (test)
Accuracy87.3
248
Pixel-by-pixel Image ClassificationPermuted Sequential MNIST (pMNIST) (test)
Accuracy97.58
79
Sequential Image ClassificationPMNIST (test)
Accuracy (Test)96.96
77
Time-series classificationCHARACTER TRAJ. (test)
Accuracy0.349
73
Sequential Image ClassificationS-MNIST (test)
Accuracy99.28
70
Word-level Language ModelingWikiText-103 word-level (test)
Perplexity34.6
65
Pixel-level 1-D image classificationSequential MNIST (test)
Accuracy99.28
53
Permuted Sequential Image ClassificationMNIST Permuted Sequential
Test Accuracy Mean96.96
50
Sequential Image ClassificationSequential CIFAR10
Accuracy74.4
48
1-D Pixel-level Image ClassificationsCIFAR (test)
Accuracy74.4
46
Showing 10 of 18 rows

Other info

Follow for update