Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

About

Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.

Kyle Helfrich, Devin Willmott, Qiang Ye• 2017

Related benchmarks

Task	Dataset	Result
Classification	CIFAR10 (test)	Accuracy81.7	331
Image Classification	Permuted MNIST T=784 (test)	Accuracy96.6	62
Character-level Prediction	PTB (test)	BPC (Test)1.36	42
Sequential Image Classification	MNIST ordered pixel-by-pixel 1.0 (test)	Accuracy92.9	32
Keyword Spotting	Google Speech Commands Google12 V2 (test)	Accuracy94.9	22
Word-level prediction	PTB word-level (test)	Perplexity116.9	19
Sequential Image Recognition	sMNIST	Test Accuracy98.9	16
Speech prediction	TIMIT (val)	MSE7.97	13
Speech prediction	TIMIT (test)	MSE7.36	13
Heart-rate prediction	PPG data TSR archive (test)	Test L2 Error9.93	13

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord