Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

About

In this technical report, we present the Ring-linear model series, specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0. Ring-mini-linear-2.0 comprises 16B parameters and 957M activations, while Ring-flash-linear-2.0 contains 104B parameters and 6.1B activations. Both models adopt a hybrid architecture that effectively integrates linear attention and softmax attention, significantly reducing I/O and computational overhead in long-context inference scenarios. Compared to a 32 billion parameter dense model, this series reduces inference cost to 1/10, and compared to the original Ring series, the cost is also reduced by over 50%. Furthermore, through systematic exploration of the ratio between different attention mechanisms in the hybrid architecture, we have identified the currently optimal model structure. Additionally, by leveraging our self-developed high-performance FP8 operator library-linghe, overall training efficiency has been improved by 50%. Benefiting from the high alignment between the training and inference engine operators, the models can undergo long-term, stable, and highly efficient optimization during the reinforcement learning phase, consistently maintaining SOTA performance across multiple challenging complex reasoning benchmarks.

Ling Team, Bin Han, Caizhi Tang, Chen Liang, Donghao Zhang, Fan Yuan, Feng Zhu, Jie Gao, Jingyu Hu, Longfei Li, Meng Li, Mingyang Zhang, Peijie Jiang, Peng Jiao, Qian Zhao, Qingyuan Yang, Wenbo Shen, Xinxing Yang, Yalin Zhang, Yankun Ren, Yao Zhao, Yibo Cao, Yixuan Sun, Yue Zhang, Yuchen Fang, Zibin Lin, Zixuan Cheng, Jun Zhou• 2025

Related benchmarks

Task	Dataset	Result
Mathematics	GSM8K	GSM8K Score72.44	87
General Knowledge	MMLU-Pro	MMLU-Pro General Knowledge Score38.83	56
General Knowledge	CMMLU	Accuracy68.41	50
General Knowledge	MMLU	--	39
Long-context retrieval	RULER 16k	Score78.06	34
Long-context retrieval	RULER 128k	Score67.98	23
Math	CMath	Score79.09	22
Long-context retrieval	RULER 64K context	Accuracy73.5	19
General Knowledge	CEval	Score67.42	19
Long-context retrieval	RULER 32k	RULER 32K Retrieval Score76.48	15

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord