Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Interdomain Attention: Beyond Token-Level Key-Value Memory

About

Transformers and deep state space models (SSMs) sit at opposite ends of a basic design choice: attention routes each query through a growing key-value (KV) cache by content-based matching at quadratic cost, while deep SSMs compress context into a fixed-size recurrent state that is not directly addressed by query-key matching. We propose Interdomain Attention, which integrates an SSM into an attention module through kernel methods: an attention kernel is approximated by a finite feature map, the resulting key features and values are projected onto a shared set of basis functions maintained by a single SSM recurrence, and each query attends to the compressed coefficients through its own feature map, recovering query-conditioned attention over a fixed-size state. The scalable layer is a learned relaxation of this derivation, and we validate its components through ablations. In a 125M to 1.3B autoregressive language-modeling study on FineWeb-Edu at matched recurrent-state budget, Interdomain Attention improves on an SSM token mixer at every scale, surpasses a same-recipe softmax baseline at 1.3B on validation perplexity and on the eight-task commonsense suite, and inherits the length-flat behavior of its fixed-state core out to 3.5x the training context. Ablations indicate that the query-conditioned projection is the main source of the gain.

Naoki Kiyohara, Harrison Bo Hua Zhu, Riccardo El Hassanin, Zhuo Sun, Wenlong Chen, Samir Bhatt, Yingzhen Li• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2--
2320
Commonsense ReasoningCommonsense 8 Sub-Tasks
Accuracy (8 Sub-Tasks)54.54
26
Language Modeling1.3B 26B-token pre-training corpus (val)
Validation Cross-Entropy2.077
3
Language ModelingLAMBADA
LAMBADA Accuracy44.36
3
Long-context Language UnderstandingLongBench 14-subtask configuration
Average Performance (14 Tasks)12.29
3
Showing 5 of 5 rows

Other info

Follow for update