Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

About

The core challenge for streaming video generation is maintaining the content consistency in long context, which poses high requirement for the memory design. Most existing solutions maintain the memory by compressing historical frames with predefined strategies. However, different to-generate video chunks should refer to different historical cues, which is hard to satisfy with fixed strategies. In this work, we propose MemFlow to address this problem. Specifically, before generating the coming chunk, we dynamically update the memory bank by retrieving the most relevant historical frames with the text prompt of this chunk. This design enables narrative coherence even if new event happens or scenario switches in future frames. In addition, during generation, we only activate the most relevant tokens in the memory bank for each query in the attention layers, which effectively guarantees the generation efficiency. In this way, MemFlow achieves outstanding long-context consistency with negligible computation burden (7.9% speed reduction compared with the memory-free baseline) and keeps the compatibility with any streaming video generation model with KV cache.

Sihui Ji, Xi Chen, Shuai Yang, Xin Tao, Pengfei Wan, Hengshuang Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench Long
Motion Smoothness16.44
49
Video GenerationVBench
Motion Smoothness97.17
37
Video GenerationVBench Short-Duration extended prompt suite
Total Score83.01
12
Video GenerationSingle-prompt 5-second setting
Total Score85.14
11
Video GenerationVBench standard prompt (5s setting)
Dynamic Score50
11
Video GenerationVBench single-prompt 5-second setting
Dynamic Score50
11
Short Video GenerationVBench standard
Total Score83.13
9
Multi-prompt Video GenerationNarraStream-Bench multi-prompt 60-second setting
SC95.37
6
Video GenerationVBench multi-prompt 60-second setting
Aesthetic Quality60.02
6
Long Video GenerationMulti-prompt 60-second setting
Throughput (FPS)18.7
6
Showing 10 of 23 rows

Other info

Follow for update