SONIC: Segmented Optimized Nexus for Information Compression in Key-Value Caching

About

The linear growth of Key-Value (KV) cache remains a bottleneck for multi-turn LLM deployment. Existing KV cache compression methods often fail to account for the structural properties of multi-turn dialogues, relying on heuristic eviction that risks losing critical context. We propose \textbf{SONIC}, a learning-based framework that compresses historical segments into compact and semantically rich \textbf{Nexus} tokens. By integrating dynamic budget training, SONIC allows flexible adaptation to varying memory constraints without retraining. Experiments show that at compression ratios of 80\% and 50\%, SONIC consistently outperforms baselines such as H2O and StreamingLLM on four diverse multi-turn benchmarks. Specifically, on the widely used MTBench101 benchmark, SONIC achieves an average score improvement of 35.55\% over state-of-the-art baselines, validating its effectiveness in sustaining coherent multi-turn dialogues. Furthermore, SONIC enhances deployment efficiency, accelerating the overall inference process by 50.1\% compared to full-context generation.

Hong Chen, Xiang Liu, Bo Wang, Yuxuan Fan, Yuanlin Chu, Zongluo Li, Xiaowen Chu, Xuming Hu• 2026

Related benchmarks

Task	Dataset	Result
Multi-turn dialogue	MTBench101	Score8.43	33
Safety Dialogue Evaluation	SafeDialBench	Score8.33	33
Coreference Resolution	CoreRes	Accuracy69.01	33
Mathematical Reasoning	GSM8K Var	Accuracy8.3	33

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord