Long Expressive Memory for Sequence Modeling

About

We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.

T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael W. Mahoney• 2021

Related benchmarks

Task	Dataset	Result
Character-level Prediction	PTB (test)	BPC (Test)1.25	42
Sequential Image Classification	MNIST ordered pixel-by-pixel 1.0 (test)	Accuracy96.6	32
Physical System Modeling	Dataset-1	RMSE0.0119	29
Secondary lyophilization process prediction	secondary lyophilization dataset	RMSE0.0154	27
Dynamical systems reconstruction	Lorenz-63 3d	Dstsp0.39	23
Keyword Spotting	Google Speech Commands Google12 V2 (test)	Accuracy95.7	22
Word-level prediction	PTB word-level (test)	Perplexity72.8	19
Sequential Image Recognition	sMNIST	Test Accuracy99.5	16
Heart-rate prediction	PPG data TSR archive (test)	Test L2 Error0.85	13
Sequential Image Recognition	nCIFAR-10	Test Accuracy60.5	8

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord