Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens

About

Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.

Youngrae Kim, Qixin Hu, C.-C. Jay Kuo, Peter A. Beerel• 2026

Related benchmarks

TaskDatasetResultRank
Short Video GenerationVBench-Long 60 seconds
Aesthetic Quality58.76
13
Short Video GenerationVBench-Long 30 seconds
Aesthetic Quality59.31
10
Long Video GenerationVBench Long 120 seconds
Aesthetic Quality59.25
8
Long Video GenerationVBench-Long (240 seconds)
Aesthetic Quality58.9
8
Video GenerationVBench Long 480 seconds
Aesthetic Quality57.96
4
Visual StabilityLongLive 44
Stability4.15
4
Video GenerationLongLive
Color Consistency81.6
3
Video GenerationSelf-Forcing
Color Consistency93
2
Video GenerationVBench-Long 1 hour
Aesthetic Quality63.05
2
Visual StabilitySelf-Forcing 19--
2
Showing 10 of 10 rows

Other info

Follow for update