Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

About

Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-key selection and unstable reasoning. To avoid this issue, we turn to the pre-RoPE space, where we observe that Q and K vectors are highly concentrated around fixed non-zero centers and remain stable across positions -- Q/K concentration. We show that this concentration causes queries to preferentially attend to keys at specific distances (e.g., nearest keys), with the centers determining which distances are preferred via a trigonometric series. Based on this, we propose TriAttention to estimate key importance by leveraging these centers. Via the trigonometric series, we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation. On AIME25 with 32K-token generation, TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction, whereas leading baselines achieve only about half the accuracy at the same efficiency. TriAttention enables OpenClaw deployment on a single consumer GPU, where long context would otherwise cause out-of-memory with Full Attention.

Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024 (test)
Accuracy46.7
159
Long-context UnderstandingLongBench (test)
Avg Score48.1
136
Mathematical ReasoningAIME 2024
Accuracy59.2
104
Mathematical ReasoningAIME 2025
Accuracy49.2
58
Reasoning FidelityAIME 2024 (ref6)
Top-1 Acc89.4
6
Reasoning FidelityAIME 2025 (ref6)
Top-1 Accuracy0.899
6
RetrievalRULER 4k context
RULER Average Score66.1
4
Prompt-heavy Aggregate Performance EvaluationPrompt-heavy Aggregate qasper, multi_news, hotpotqa, musique, 2wikimqa
Weighted Top-158.42
4
Mathematical ReasoningMATH 500
Accuracy68.4
2
Mathematical ReasoningAIME24
Accuracy54.6
2
Showing 10 of 11 rows

Other info

GitHub

Follow for update