TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

About

Generating ultra-long sequences with large language models (LLMs) has become increasingly crucial but remains a highly time-intensive task, particularly for sequences up to 100K tokens. While traditional speculative decoding methods exist, simply extending their generation limits fails to accelerate the process and can be detrimental. Through an in-depth analysis, we identify three major challenges hindering efficient generation: frequent model reloading, dynamic key-value (KV) management and repetitive generation. To address these issues, we introduce TOKENSWIFT, a novel framework designed to substantially accelerate the generation process of ultra-long sequences while maintaining the target model's inherent quality. Experimental results demonstrate that TOKENSWIFT achieves over 3 times speedup across models of varying scales (1.5B, 7B, 8B, 14B) and architectures (MHA, GQA). This acceleration translates to hours of time savings for ultra-long sequence generation, establishing TOKENSWIFT as a scalable and effective solution at unprecedented lengths. Code can be found at https://github.com/bigai-nlco/TokenSwift.

Tong Wu, Junzhe Shen, Zixia Jia, Yuxuan Wang, Zilong Zheng• 2025

Related benchmarks

Task	Dataset	Result
Long-context Input (Summarization)	GovReport	Speedup1.58	31
Long-context code completion	RepoBench-P	MAT1.83	11
Long-context code completion	LCC	MAT1.85	11
Long-context Summarization	MultiNews	MAT1.83	11
Long-context generation	PG-19 10K context length	Throughput Speedup (micro-averaged)1.41	6
Long-context generation	PG-19 20K context length	Throughput Speedup (micro)1.7	6
Long-context generation	PG-19 30K context length	Throughput Speedup (micro)1.83	6
Long-context generation	PG-19 40K context length	Throughput Speedup (micro)1.98	6
Long-context generation	PG-19 50K context length	Throughput Speedup (micro)2.06	6
Long-context generation	PG-19 60K context length	Throughput Speedup (micro)2.11	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord