Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

About

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

Tri Dao, Albert Gu• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy53.9
1460
Multi-task Language UnderstandingMMLU
Accuracy23
842
Commonsense ReasoningWinoGrande
Accuracy72.3
776
Question AnsweringARC Challenge
Accuracy46.8
749
Commonsense ReasoningPIQA
Accuracy72.8
647
Language ModelingWikiText
PPL15.9
479
Question AnsweringARC Easy
Accuracy77.6
386
Physical Commonsense ReasoningPIQA
Accuracy81
329
Long-context Language UnderstandingLongBench
M-Avg31.57
219
Multitask Language UnderstandingMMLU
Accuracy33.2
206
Showing 10 of 61 rows

Other info

Follow for update