Long Expressive Memory for Sequence Modeling
About
We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Character-level Prediction | PTB (test) | BPC (Test)1.25 | 42 | |
| Sequential Image Classification | MNIST ordered pixel-by-pixel 1.0 (test) | Accuracy96.6 | 32 | |
| Dynamical systems reconstruction | Lorenz-63 3d | Dstsp0.39 | 23 | |
| Keyword Spotting | Google Speech Commands Google12 V2 (test) | Accuracy95.7 | 22 | |
| Word-level prediction | PTB word-level (test) | Perplexity72.8 | 19 | |
| Sequential Image Recognition | sMNIST | Test Accuracy99.5 | 16 | |
| Heart-rate prediction | PPG data TSR archive (test) | Test L2 Error0.85 | 13 | |
| Sequential Image Recognition | nCIFAR-10 | Test Accuracy60.5 | 8 | |
| Dynamical systems reconstruction | Lorenz-96 20d | Dstsp7.2 | 8 | |
| Dynamical systems reconstruction | ECG | Dstsp16.3 | 7 |