Compressing Context to Enhance Inference Efficiency of Large Language Models

About

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50\% reduction in context cost, resulting in a 36\% reduction in inference memory usage and a 32\% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance.

Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy61.33	1398
Reasoning	BBH	Accuracy50.07	726
Multi-hop Question Answering	HotpotQA (test)	F154.9	311
Long-context Language Understanding	LongBench	--	294
Question Answering	SQuAD (test)	F159.7	156
Long-context Understanding	LongBench (test)	Avg Score20.2	136
Mathematical Reasoning	GSM8K	EM2.5	123
Long-context Understanding	LongBench	Overall Average Score32.16	115
Reasoning	BBH (test)	--	94
Multi-source information fusion in RAG-based Web Q&A	EntityQuestions	AUC Score499.9	72

Showing 10 of 48 rows

Other info

Follow for update

@wizwand_team Discord