MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens
About
Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Short Video Generation | VBench-Long 60 seconds | Aesthetic Quality58.76 | 13 | |
| Short Video Generation | VBench-Long 30 seconds | Aesthetic Quality59.31 | 10 | |
| Long Video Generation | VBench Long 120 seconds | Aesthetic Quality59.25 | 8 | |
| Long Video Generation | VBench-Long (240 seconds) | Aesthetic Quality58.9 | 8 | |
| Video Generation | VBench Long 480 seconds | Aesthetic Quality57.96 | 4 | |
| Visual Stability | LongLive 44 | Stability4.15 | 4 | |
| Video Generation | LongLive | Color Consistency81.6 | 3 | |
| Video Generation | Self-Forcing | Color Consistency93 | 2 | |
| Video Generation | VBench-Long 1 hour | Aesthetic Quality63.05 | 2 | |
| Visual Stability | Self-Forcing 19 | -- | 2 |