InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context

About

Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV) caches for individual documents and selectively recompute a small subset of tokens to restore global causal dependencies, but existing methods rely on heuristics or representation discrepancies without modeling whether selected tokens can effectively influence generation. We cast selective KV recomputation as an information flow problem and show that a simple attention-norm signal from the query reliably identifies tokens that are both semantically relevant and structurally positioned to propagate information, when computed under an inference-consistent RoPE geometry. We therefore reconstruct global positional assignments for retrieved chunks and introduce an information-flow-guided chunk reordering strategy. Experiments on LLM and VLM benchmarks demonstrate consistent gains over prior methods under comparable efficiency budgets.

Xin Teng, Canyu Zhang, Shaoyi Zheng, Danyang Zhuo, Tianyi Zhou, Shengjie Wang• 2026

Related benchmarks

Task	Dataset	Result
Visual Question Answering	ChartQA	Accuracy73.48	519
Real-world Visual Question Answering	RealworldQA	Accuracy68.1	173
Visual Question Answering	InfoVQA (val)	Accuracy73.07	91
Visual Question Answering	HRBench 4K	Accuracy0.7262	61
Visual Question Answering	OCRBench	Score842	53
Long-context Question Answering	2WikiMQA Fixed Chunk 2048	QA Score51.76	18
Long-context Question Answering	MuSiQue Fixed Chunk 2048	Score37.86	18
Long-context Question Answering	HotpotQA Fixed Chunk 2048	QA Score59.67	18
Long-context Question Answering	NarrativeQA Fixed Chunk 2048	Score32.39	18
Long-context Question Answering	MuSiQue (Passage Split)	Score37.58	18

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord