SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

About

While transformer-based Large Language Models (LLMs) theoretically support massive context windows, they suffer from severe performance degradation when processing long numerical sequences. We attribute this failure to the attention dispersion in the Softmax mechanism, which prevents the model from concentrating attention. To overcome this, we propose Separate Sequence (SepSeq), a training-free, plug-and-play framework to mitigate dispersion by strategically inserting separator tokens. Mechanistically, we demonstrate that separator tokens act as an attention sink, recalibrating attention to focus on local segments while preserving global context. Extensive evaluations on 9 widely-adopted LLMs confirm the effectiveness of our approach: SepSeq yields an average relative accuracy improvement of 35.6% across diverse domains while reducing total inference token consumption by 16.4% on average.

Jie Sun, Yu Liu, Lu Han, Qiwen Deng, Xiang Shu, Yang Xiao, Xingyu Lu, Jun Zhou, Pengfei Liu, Lintao Ma, Jiancan Wu, Xiang Wang• 2026

Related benchmarks

Task	Dataset	Result
Sequence Understanding	Sequence Understanding Benchmark Average of 10 tasks	Answer Rate100	40
Counting	Counting	Accuracy76.7	7
number-list	number-list	Answer Rate79.1	4
number-string	number-string	Answer Rate99	4
stock	Stock	Answer Rate73.6	4
weather	Weather	Answer Rate76.8	4
Indexing	indexing	Answer Rate100	4
max-float	max-float	Answer Rate100	4
max-int	max-int	Answer Rate100	4
min-float	min-float	Answer Rate100	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord