Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

About

We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited capability for multi-turn real-time understanding, and (2) lack of proactive response mechanisms. Specifically, StreamBridge incorporates (1) a memory buffer combined with a round-decayed compression strategy, supporting long-context multi-turn interactions, and (2) a decoupled, lightweight activation model that can be effortlessly integrated into existing Video-LLMs, enabling continuous proactive responses. To further support StreamBridge, we construct Stream-IT, a large-scale dataset tailored for streaming video understanding, featuring interleaved video-text sequences and diverse instruction formats. Extensive experiments show that StreamBridge significantly improves the streaming understanding capabilities of offline Video-LLMs across various tasks, outperforming even proprietary models such as GPT-4o and Gemini 1.5 Pro. Simultaneously, it achieves competitive or superior performance on standard video understanding benchmarks.

Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang• 2025

Related benchmarks

TaskDatasetResultRank
Streaming Video UnderstandingStreamingBench
Overall57.12
158
Real-Time Visual UnderstandingStreamingBench
Overall Score73.79
96
Long Video UnderstandingVideoMME
Accuracy64.4
40
Streaming Video UnderstandingOVOBench
Accuracy (Proactive Forwarding)48.4
17
Readiness-aware streaming understandingProReady-QA
SSR Accuracy72.2
14
Dense Video CaptioningE.T.Bench--
14
Online Activation AccuracyET-Bench
TVG F135.7
10
Step Localization and CaptioningET-Bench
F1 Score22.6
4
Temporal Video GroundingET-Bench
F1-score34.3
4
Showing 9 of 9 rows

Other info

Follow for update