Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

About

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approach is evaluated on LOCOMO, LOCCO, and LongBench benchmarks to assess answer quality, retrieval accuracy, coherence preservation, and efficiency. Experimental results demonstrate that the proposed method achieves consistent improvements in conversational stability and retrieval performance while reducing token usage and inference latency compared with existing memory and compression-based approaches. These findings indicate that adaptive context compression provides an effective balance between long-term memory preservation and computational efficiency in persistent LLM interactions

Payal Fofadiya, Sunil Tiwari• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	SQuAD	--	162
Open-domain Question Answering	MS Marco	--	48
Long-context modeling	InfiniteBench	--	13
Long-running conversation	LoCoMo (test)	Answer Accuracy89	6
Long-context Conversational Memory	LOCCO	Consistency Score4.45	2
Long-context memory management	Locomo	Overall QA F152	2
Long-term Conversational Memory Evaluation	LOCCO	Consistency4.45	2
Code Completion	RepoBench	--	1
Long-context compression and memory management	Various SCM, HotpotQA, MS-MARCO, SQuAD, LongBench, Ruler, InfiniteBench, LOCOMO, LOCCO	--	1
Long-context modeling	RULER	--	1

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord