A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
About
Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton• 2015
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | MNIST (test) | Accuracy97 | 882 | |
| Language Modeling | One Billion Word Benchmark (test) | Test Perplexity69.4 | 108 | |
| Pixel-by-pixel Image Classification | Permuted Sequential MNIST (pMNIST) (test) | Accuracy82 | 79 | |
| Sequential Image Classification | PMNIST (test) | Accuracy (Test)82 | 77 | |
| Image Classification | permuted MNIST (pMNIST) (test) | Accuracy82 | 63 | |
| Permuted Sequential Image Classification | MNIST Permuted Sequential | Test Accuracy Mean82 | 50 | |
| Sequential Image Classification | MNIST Sequential (test) | Accuracy97 | 47 | |
| Image Classification | pixel-by-pixel MNIST (test) | Accuracy97 | 28 | |
| Permuted Pixel-by-Pixel MNIST Classification | Permuted MNIST (pMNIST) pixel-by-pixel (test) | Accuracy (Clean)82 | 25 | |
| Phone recognition | TIMIT (test) | Frame Error Rate29.7 | 23 |
Showing 10 of 17 rows