A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

About

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton• 2015

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST (test)	Accuracy97	894
Language Modeling	One Billion Word Benchmark (test)	Test Perplexity69.4	125
Pixel-by-pixel Image Classification	Permuted Sequential MNIST (pMNIST) (test)	Accuracy82	79
Sequential Image Classification	PMNIST (test)	Accuracy (Test)82	77
Image Classification	permuted MNIST (pMNIST) (test)	Accuracy82	69
Sequential Image Classification	MNIST Sequential (test)	Accuracy97	51
Permuted Sequential Image Classification	MNIST Permuted Sequential	Test Accuracy Mean82	50
Image Classification	pixel-by-pixel MNIST (test)	Accuracy97	28
Permuted Pixel-by-Pixel MNIST Classification	Permuted MNIST (pMNIST) pixel-by-pixel (test)	Accuracy (Clean)82	25
Phone recognition	TIMIT (test)	Frame Error Rate29.7	23

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord