Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

About

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton• 2015

Related benchmarks

TaskDatasetResultRank
Image ClassificationMNIST (test)
Accuracy97
894
Language ModelingOne Billion Word Benchmark (test)
Test Perplexity69.4
113
Pixel-by-pixel Image ClassificationPermuted Sequential MNIST (pMNIST) (test)
Accuracy82
79
Sequential Image ClassificationPMNIST (test)
Accuracy (Test)82
77
Image Classificationpermuted MNIST (pMNIST) (test)
Accuracy82
69
Permuted Sequential Image ClassificationMNIST Permuted Sequential
Test Accuracy Mean82
50
Sequential Image ClassificationMNIST Sequential (test)
Accuracy97
47
Image Classificationpixel-by-pixel MNIST (test)
Accuracy97
28
Permuted Pixel-by-Pixel MNIST ClassificationPermuted MNIST (pMNIST) pixel-by-pixel (test)
Accuracy (Clean)82
25
Phone recognitionTIMIT (test)
Frame Error Rate29.7
23
Showing 10 of 17 rows

Other info

Follow for update