Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

XAttention: Block Sparse Attention with Antidiagonal Scoring

About

Long-Context Transformer Models (LCTMs) are vital for real-world applications but suffer high computational costs due to attention's quadratic complexity. Block-sparse attention mitigates this by focusing computation on critical regions, yet existing methods struggle with balancing accuracy and efficiency due to costly block importance measurements. In this paper, we introduce XAttention, a plug-and-play framework that dramatically accelerates long-context inference in Transformers models using sparse attention. XAttention's key innovation is the insight that the sum of antidiagonal values (i.e., from the lower-left to upper-right) in the attention matrix provides a powerful proxy for block importance. This allows for precise identification and pruning of non-essential blocks, resulting in high sparsity and dramatically accelerated inference. Across comprehensive evaluations on demanding long-context benchmarks-including RULER and LongBench for language, VideoMME for video understanding, and VBench for video generation. XAttention achieves accuracy comparable to full attention while delivering substantial computational gains. We demonstrate up to 13.5x acceleration in attention computation. These results underscore XAttention's ability to unlock the practical potential of block sparse attention, paving the way for scalable and efficient deployment of LCTMs in real-world applications. Code is available at https://github.com/mit-han-lab/x-attention.

Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, Song Han• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy84.15
983
Code GenerationHumanEval
Pass@180.49
850
Video UnderstandingVideoMME
Overall Score70.81
192
Long-context UnderstandingLongBench
Overall Average Score39.68
115
Video GenerationVBench--
102
Video UnderstandingVideo-MME without subtitles
Overall Score63.9
67
Long-context UnderstandingRULER
Performance @ 4K Context97.27
65
Long-context language modelingLongBench-E 1.0 (test)
S-Doc QA Perf.48.82
37
Long-context language modeling evaluationHELMET
Average Sparsity41.96
28
Long Video UnderstandingVNBench
Retrieval E Accuracy90.67
21
Showing 10 of 24 rows

Other info

Follow for update