LIVE: Long-horizon Interactive Video World Modeling
About
Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as small prediction errors accumulate over time. Prior methods alleviate this by introducing pre-trained teacher models and sequence-level distribution matching, which incur additional computational cost and fail to prevent error propagation beyond the training horizon. In this work, we propose LIVE, a Long-horizon Interactive Video world modEl that enforces bounded error accumulation via a novel cycle-consistency objective, thereby eliminating the need for teacher-based distillation. Specifically, LIVE first performs a forward rollout from ground-truth frames and then applies a reverse generation process to reconstruct the initial state. The diffusion loss is subsequently computed on the reconstructed terminal state, providing an explicit constraint on long-horizon error propagation. Moreover, we provide an unified view that encompasses different approaches and introduce progressive training curriculum to stabilize training. Experiments demonstrate that LIVE achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | RealEstate10K 0~64 frames (test) | PSNR18.11 | 6 | |
| Video Generation | RealEstate10K 0~128 frames (test) | PSNR15.91 | 6 | |
| Video Generation | RealEstate10K 0~200 frames (test) | PSNR14.57 | 6 | |
| Video Generation | RealEstate10K >=256 frames (test) | PSNR13.89 | 6 | |
| Interactive World Modeling | UE Engine Realistic Game (0~64 frames) | PSNR17.83 | 3 | |
| Interactive World Modeling | UE Realistic Game Engine (0~128 frames) | PSNR15.85 | 3 | |
| Interactive World Modeling | UE Engine Realistic Game Engine (0~256 frames) | PSNR14.04 | 3 | |
| Interactive World Modeling | UE Realistic Game Engine (≥400 frames) | PSNR12.96 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~32 frames) | PSNR17.87 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~64 frames) | PSNR16.31 | 3 |