Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Large Language Model as Token Compressor and Decompressor

About

In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To demonstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactly from them. The resulting representation is content-adaptive: semantically dense segments receive more Z-tokens, while redundant or predictable regions are aggressively compressed, via lightweight LoRA-based adapter heads. Empirically, our method achieves up to 18 times token reduction on Wikipedia, CNN/DailyMail, HotpotQA, and Qulac-style long-query datasets, while preserving reconstruction fidelity and downstream performance. This simple yet effective design supports applications including prompt compression and autoregressive generation directly in the Z-token space, offering a potential pathway toward token-efficient long-context reasoning.

Wenbing Li, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Yiran Wang, Wei Yang• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringNarrativeQA (test)--
68
Question AnsweringQASPER (test)
F1 Score (Match)18.31
27
Text SummarizationCNN/DailyMail
ROUGE-132.58
13
ReconstructionWikipedia
BLEU-499.31
8
Question AnsweringQuALITY (test)
F1 Score39.25
6
Question AnsweringHotpotQA (test)
F1 Score33.35
6
Showing 6 of 6 rows

Other info

Follow for update