Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

About

Recently, autoregressive (AR) video diffusion models have achieved remarkable performance. However, due to their limited training durations, a train-test gap emerges when testing at longer horizons, leading to rapid visual degradations. Following Self Forcing, which studies the train-test gap within the training duration, this work studies the train-test gap beyond the training duration, i.e., the gap between the limited horizons during training and open-ended horizons during testing. Since open-ended testing can extend beyond any finite training window, and long-video training is computationally expensive, we pursue a training-free solution to bridge this gap. To explore a training-free solution, we conduct a systematic analysis of AR cache maintenance. These insights lead to Rolling Sink. Built on Self Forcing (trained on only 5s clips), Rolling Sink effectively scales the AR video synthesis to ultra-long durations (e.g., 5-30 minutes at 16 FPS) at test time, with consistent subjects, stable colors, coherent structures, and smooth motions. As demonstrated by extensive experiments, Rolling Sink achieves superior long-horizon visual fidelity and temporal consistency compared to SOTA baselines. Project page: https://rolling-sink.github.io/

Haodong Li, Shaoteng Liu, Zhe Lin, Manmohan Chandraker• 2026

Related benchmarks

TaskDatasetResultRank
Long Video GenerationVBench-Long 60 seconds
Subject Consistency97.84
74
Video GenerationVBench 5s horizon 21 frames
Subjective Quality0.979
11
Video GenerationVBench 30s horizon 120 frames
Subjective Quality Score0.977
10
Long Video GenerationVBench-Long 120s
Aesthetic Quality61.53
6
Long Video GenerationLong videos User Study (test)
Text Alignment3.24
6
AR Video SynthesisVBench++ Long 1-minute AR video synthesis
Subject Consistency98.58
3
Autoregressive Video SynthesisVBench Long (5-minute)
Subject Consistency0.9804
3
Showing 7 of 7 rows

Other info

Follow for update