Kimi Linear: An Expressive, Efficient Attention Architecture
About
We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mechanism, enabling more effective use of limited finite-state RNN memory. Our bespoke chunkwise algorithm achieves high hardware efficiency through a specialized variant of the Diagonal-Plus-Low-Rank (DPLR) transition matrices, which substantially reduces computation compared to the general DPLR formulation while remaining more consistent with the classical delta rule. We pretrain a Kimi Linear model with 3B activated parameters and 48B total parameters, based on a layerwise hybrid of KDA and Multi-Head Latent Attention (MLA). Our experiments show that with an identical training recipe, Kimi Linear outperforms full MLA with a sizeable margin across all evaluated tasks, while reducing KV cache usage by up to 75% and achieving up to 6 times decoding throughput for a 1M context. These results demonstrate that Kimi Linear can be a drop-in replacement for full attention architectures with superior performance and efficiency, including tasks with longer input and output lengths. To support further research, we open-source the KDA kernel and vLLM implementations, and release the pre-trained and instruction-tuned model checkpoints.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | -- | 1442 | |
| Question Answering | ARC Challenge | Accuracy (ARC)33.19 | 598 | |
| Language Modeling | LAMBADA | Accuracy49.21 | 412 | |
| Question Answering | OpenBookQA | Accuracy34.2 | 305 | |
| Multitask Language Understanding | MMLU | Accuracy56.31 | 263 | |
| Language Modeling | WikiText | Word Perplexity16.01 | 234 | |
| Commonsense Reasoning | PIQA | Accuracy71.11 | 213 | |
| Question Answering | ARC Easy | Accuracy62.12 | 210 | |
| Question Answering | BoolQ | Accuracy55.29 | 201 | |
| Mathematical Reasoning | GSM-8K | Accuracy37.45 | 107 |