Less Is More: Fast and Accurate Reasoning with Cross-Head Unified Sparse Attention

About

Large reasoning models achieve strong performance through test-time scaling, but this incurs substantial computational overhead due to long decoding from short prompts. While sparse attention can reduce latency and memory usage, existing methods often degrade reasoning accuracy because selection errors accumulate over long generation horizons, or require costly retraining. We introduce LessIsMore, a training-free sparse attention mechanism for long-horizon reasoning. Our key insight is that token importance in reasoning is global and stable: critical tokens are largely shared across attention heads and remain stable over decoding steps. Guided by this structure, LessIsMore enforces cross-head unified token selection and preserves recent context via a stable recency window, yielding a globally consistent token set that can be reused across layers. Across multiple model families and challenging reasoning benchmarks, LessIsMore matches or improves accuracy while attending to substantially fewer tokens. With kernel-level optimizations, LessIsMore achieves up to $1.6\times$ end-to-end decoding speedup and up to $1.72\times$ faster sparse attention computation, with additional long-context results demonstrating the generality of our approach. Code is available at \href{https://github.com/DerrickYLJ/LessIsMore}{https://github.com/DerrickYLJ/LessIsMore}.

Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Accuracy (Acc)95.14	543
Mathematical Reasoning	AIME 24	Accuracy80.39	318
Mathematical Reasoning	AIME 2024 (test)	Accuracy76.7	209
Long-context Understanding	LongBench	Accuracy91.8	60
Science Question Answering	GPQA Diamond	Accuracy65.15	59
Long-context evaluation	RULER 16k	Total Score38.27	59
Long-context evaluation	RULER 32k	Overall Score24.21	49
Long-context evaluation	RULER 4k	Score79.67	35
Long-context evaluation	RULER 8k	Score54.49	35
Long-context retrieval	Needle-in-the-Haystack 10k-context	Accuracy100	30

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord