Efficient Context Scaling with LongCat ZigZag Attention
About
We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.
Chen Zhang, Yang Bai, Jiahuan Li, Anchun Gui, Keheng Wang, Feifan Liu, Guanyu Wu, Yuwei Jiang, Defei Bu, Li Wei, Haihang Jing, Hongyin Tang, Xin Chen, Xiangzhou Huang, Fengcun Li, Rongxiang Weng, Yulei Qian, Yifan Lu, Yerui Sun, Jingang Wang, Yuchen Xie, Xunliang Cai• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | IFEval | Accuracy (0-100)88 | 292 | |
| Code Generation | HumanEval+ | Pass@187.2 | 189 | |
| General Knowledge | MMLU | MMLU General Knowledge Accuracy89.6 | 170 | |
| Code Generation | MBPP+ | Pass@179.1 | 122 | |
| Long-context Understanding | LongBench v2 | -- | 37 | |
| Multilingual Mathematical Reasoning | MGSM | -- | 36 | |
| Math | MATH 500 | Accuracy98.8 | 25 | |
| Code Generation | FullStackBench | Pass@164.1 | 20 | |
| Multilingual Knowledge | MMMLU | Accuracy85.2 | 18 | |
| General Knowledge | CEval | Score89.9 | 13 |
Showing 10 of 24 rows