Diagonal State Spaces are as Effective as Structured State Spaces

About

Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.

Ankit Gupta, Albert Gu, Jonathan Berant• 2022

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-103 (test)	Perplexity41.07	703
Language Modeling	WikiText-103 (val)	PPL39.39	261
Long-range sequence modeling	Long Range Arena (LRA)	Text Accuracy84.8	177
Long-range sequence modeling	Long Range Arena (LRA) (test)	Accuracy (Avg)80.73	163
Long-sequence modeling	Long Range Arena (LRA) v1 (test)	ListOps60.6	66
Hierarchical Reasoning	ListOps Long Range Arena (test)	Accuracy57.6	26
Sequence Modeling	Long Range Arena (val)	ListOps Accuracy57.6	26
Hierarchical reasoning on symbolic sequences	Long ListOps (test)	Accuracy57.6	22
Audio Classification	Speech Commands (SC) Unprocessed signals (RAW)	Accuracy98.2	13
Mathematical logic sequence modeling	Long Range Arena (LRA) ListOps (test)	Accuracy60.6	12

Showing 10 of 20 rows

Other info

Code

Follow for update

@wizwand_team Discord