Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

About

In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key information in the input prompt. Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs' perception of the key information to simultaneously address the three challenges. Our extensive evaluation across various long context scenarios demonstrates that LongLLMLingua not only enhances performance but also significantly reduces costs and latency. For instance, in the NaturalQuestions benchmark, LongLLMLingua boosts performance by up to 21.4% with around 4x fewer tokens in GPT-3.5-Turbo, leading to substantial cost savings. It achieves a 94.0% cost reduction in the LooGLE benchmark. Moreover, when compressing prompts of about 10k tokens at ratios of 2x-6x, LongLLMLingua can accelerate end-to-end latency by 1.4x-2.6x. Our code is available at https://aka.ms/LongLLMLingua.

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy43.46
983
Multi-hop Question AnsweringHotpotQA
F1 Score38.07
221
Multi-hop Question Answering2WikiMQA
F1 Score35.3
154
Long-context Language UnderstandingLongBench (test)
Average Score34.4
133
Question AnsweringHotpotQA
F149.4
114
Question AnsweringSQuAD (test)
F165.38
111
Question AnsweringPopQA
EM39.2
80
Long-context UnderstandingLongBench (test)
Avg Score35.5
80
Question AnsweringHotpotQA
EM34.88
79
Question Answering2WikiMultihopQA
EM35.4
73
Showing 10 of 68 rows

Other info

Code

Follow for update