Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Hidden Attention of Mamba Models

About

The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains, including NLP, long-range sequence processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via an IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers and allows us to peer inside the inner workings of the Mamba model with explainability methods. Our code is publicly available.

Ameen Ali, Itamar Zimerman, Lior Wolf• 2024

Related benchmarks

TaskDatasetResultRank
Word AlignmentRWTH Gold Alignment de-en (test)
AER0.7
31
Explanation FaithfulnessMed-BIOS
Delta AF5.326
24
Explanation FaithfulnessEmotion
Delta AF Score4.706
24
Explanation FaithfulnessSNLI
Delta AF0.554
24
Explanation FaithfulnessSST-2
Delta AF0.341
24
Token AlignmentIWSLT Fr-En 2017 (test)
AER66
22
Token AlignmentIWSLT DE→EN 2017 (test)
AER0.72
22
CopyingCopying task
AUC84
11
Explanation FaithfulnessImageNet
Delta AF2.427
8
Showing 9 of 9 rows

Other info

Code

Follow for update