Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SONIC: Segmented Optimized Nexus for Information Compression in Key-Value Caching

About

The linear growth of Key-Value (KV) cache remains a bottleneck for multi-turn LLM deployment. Existing KV cache compression methods often fail to account for the structural properties of multi-turn dialogues, relying on heuristic eviction that risks losing critical context. We propose \textbf{SONIC}, a learning-based framework that compresses historical segments into compact and semantically rich \textbf{Nexus} tokens. By integrating dynamic budget training, SONIC allows flexible adaptation to varying memory constraints without retraining. Experiments show that at compression ratios of 80\% and 50\%, SONIC consistently outperforms baselines such as H2O and StreamingLLM on four diverse multi-turn benchmarks. Specifically, on the widely used MTBench101 benchmark, SONIC achieves an average score improvement of 35.55\% over state-of-the-art baselines, validating its effectiveness in sustaining coherent multi-turn dialogues. Furthermore, SONIC enhances deployment efficiency, accelerating the overall inference process by 50.1\% compared to full-context generation.

Hong Chen, Xiang Liu, Bo Wang, Yuxuan Fan, Yuanlin Chu, Zongluo Li, Xiaowen Chu, Xuming Hu• 2026

Related benchmarks

TaskDatasetResultRank
Multi-turn dialogueMTBench101
Score8.43
33
Safety Dialogue EvaluationSafeDialBench
Score8.33
33
Coreference ResolutionCoreRes
Accuracy69.01
33
Mathematical ReasoningGSM8K Var
Accuracy8.3
33
Showing 4 of 4 rows

Other info

Follow for update