Forget Attention: Importance-Aware Attention Is All You Need

About

Combining attention's global retrieval with the sequential importance signal of state space models (SSMs) is the open challenge of hybrid language modeling. Transformers see everywhere but cannot prioritize; SSMs know what matters but cannot revisit. Existing hybrids -- Jamba (block level) and Hymba (head level) -- place the two in separate compartments, so neither informs the other during the attention computation itself. We propose SISA (SSM-Informed Softmax Attention), which adds an SSM-derived importance term directly inside the attention score and realizes the full operation as a single SDPA call on augmented query/key vectors -- no recurrent state, no custom kernel. At 152M / 5B tokens, SISA reaches LAMBADA-greedy 17.3% (vs. Transformer 13.9 and Mamba-3 15.5) and attains NIAH 100% from step 1K, 7x faster than Transformer's retrieval convergence; at 369M, Mamba-3 leads LAMBADA while SISA preserves perfect NIAH and stock-SDPA execution. SISA thus defines a third design axis for SSM-attention hybrids -- score-level fusion -- beyond the block-level and head-level paradigms that have dominated the field.

Suhyeong Shin, Yeongwook Yang• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy26.9	897
Question Answering	ARC Easy	--	597
Sentence Completion	HellaSwag	Accuracy26.9	440
Coreference Resolution	WinoGrande	Accuracy52.5	72
Pronoun Resolution	WinoGrande	Accuracy52.5	64
Long-context retrieval	Needle-in-a-Haystack	Retrieval Accuracy100	29
Science QA	ARC Easy	Accuracy35.8	23
Needle-In-A-Haystack Retrieval	NIAH	NIAH Score100	14
Last-word prediction	LAMBADA	Accuracy17.3	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord