Unitary Evolution Recurrent Neural Networks

About

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

Martin Arjovsky, Amar Shah, Yoshua Bengio• 2015

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	--	1043
Image Classification	MNIST (test)	Accuracy95.1	894
Multi-turn Dialogue Evaluation	MT-Bench	--	532
Pixel-by-pixel Image Classification	Permuted Sequential MNIST (pMNIST) (test)	Accuracy91.4	79
Sequential Image Classification	PMNIST (test)	Accuracy (Test)91.4	77
General Language Understanding	GLUE	Accuracy90.9	75
Image Classification	permuted MNIST (pMNIST) (test)	Accuracy92.6	69
Sequential Image Classification	MNIST Sequential (test)	Accuracy98.2	51
Permuted Sequential Image Classification	MNIST Permuted Sequential	Test Accuracy Mean92.6	50
Character-level Prediction	PTB (test)	--	42

Showing 10 of 29 rows

Other info

Follow for update

@wizwand_team Discord