CoRDS: Coreset-based Representative and Diverse Selection for Streaming Video Understanding

About

Streaming video understanding with large vision-language models (VLMs) requires a compact memory that can support future reasoning over an ever-growing visual history. A common solution is to compress the key-value (KV) cache, but existing streaming methods typically rely on local token-wise heuristics, such as recency, temporal redundancy, or saliency, which do not explicitly optimize whether the retained cache is representative of the accumulated history. We propose to view KV-cache compression as a coreset selection problem: rather than scoring tokens independently for retention, we select a small subset that covers the geometry of the accumulated visual cache. Our method operates in a joint KV representation and introduces a bicriteria objective that balances coverage in key and value spaces, preserving both retrieval structure and output-relevant information. To encourage a more diverse retained subset, we further introduce an orthogonality-driven diversity criterion that favors candidates contributing new directions beyond the current selection, and connect this criterion to log-determinant subset selection. Across four open-source VLMs and five long-video and streaming-video benchmarks, our method improves over heuristic streaming compression baselines under a fixed cache budget. These results highlight that representative coreset selection offers a more effective principle, than token-wise pruning, for memory-constrained streaming video understanding.

Ailar Mahdizadeh, Puria Azadi, Muchen Li, Xiangteng He, Leonid Sigal• 2026

Related benchmarks

Task	Dataset	Result
Long Video Understanding	MLVU	--	265
Long Video Understanding	Video-MME (full)	Overall Performance64.84	51
Offline Video Understanding	VideoMME v1 (test)	Accuracy65.7	27
Offline Video Understanding	MLVU v1 (test)	Accuracy71.5	26
Offline Video Understanding	EgoSchema v1 (test)	Accuracy68.4	22
Multi-task Visual Reasoning	OVO-Bench	Backward Avg50.46	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord