EvoStreaming: Your Offline Video Model Is a Natively Streaming Assistant

About

Streaming video understanding demands more than watching longer videos: assistants must decide when to speak in real time, balancing responsiveness against verbosity. Yet most video-language models (VideoLLMs) are trained for offline inference, and existing streaming benchmarks externalize this timing decision to the evaluator. We address this gap with RealStreamEval, a frame-level multi-turn evaluation protocol that exposes models to sequential observations and penalizes unnecessary responses. Under this protocol, we observed that strong offline VideoLLMs retain useful visual understanding but lack an interaction policy for deciding when to respond. Motivated by this observation, we propose EvoStreaming, a self-evolved streaming adaptation framework in which the base model itself acts as data generator, relevance annotator, and roll-out policy to synthesize streaming trajectories without external supervision. With only $1{,}000$ self-generated samples ($139\times$ less than the leading streaming instruction-tuning approach) and no architectural changes, EvoStreaming consistently improves the overall RealStreamEval score by up to $10.8$ points across five open VideoLLM backbones (Qwen2/2.5/3-VL, InternVL-3.5, MiniCPM-V4.5) while largely preserving offline video performance. These results suggest that data-efficient interaction tuning is a practical path for adapting existing VideoLLMs to streaming assistants.

Zichen Wen, Boxue Yang, Junlong Ke, Jiajie Huang, Chenfei Liao, Junxi Wang, Xuyang Liu, Linfeng Zhang• 2026

Related benchmarks

Task	Dataset	Result
Video Understanding	VideoMME	Score (Overall)63.5	369
Video Understanding	EgoSchema	--	185
Video Understanding	MLVU	Accuracy66	147
Video Understanding	LongVideoBench	Accuracy59.2	128
General Video Understanding	LVBench	Accuracy42.2	42
Streaming Video Understanding	OVO-Bench RealStreamEval protocol	OCR82.9	17
General Video Understanding	Combined (VideoMME, LVBench, LongVideoBench, EgoSchema, MLVU)	Average Score57.8	11

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord