Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LIVE: Long-horizon Interactive Video World Modeling

About

Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as small prediction errors accumulate over time. Prior methods alleviate this by introducing pre-trained teacher models and sequence-level distribution matching, which incur additional computational cost and fail to prevent error propagation beyond the training horizon. In this work, we propose LIVE, a Long-horizon Interactive Video world modEl that enforces bounded error accumulation via a novel cycle-consistency objective, thereby eliminating the need for teacher-based distillation. Specifically, LIVE first performs a forward rollout from ground-truth frames and then applies a reverse generation process to reconstruct the initial state. The diffusion loss is subsequently computed on the reconstructed terminal state, providing an explicit constraint on long-horizon error propagation. Moreover, we provide an unified view that encompasses different approaches and introduce progressive training curriculum to stabilize training. Experiments demonstrate that LIVE achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.

Junchao Huang, Ziyang Ye, Xinting Hu, Tianyu He, Guiyu Zhang, Shaoshuai Shi, Jiang Bian, Li Jiang• 2026

Related benchmarks

TaskDatasetResultRank
Video GenerationRealEstate10K 0~64 frames (test)
PSNR18.11
6
Video GenerationRealEstate10K 0~128 frames (test)
PSNR15.91
6
Video GenerationRealEstate10K 0~200 frames (test)
PSNR14.57
6
Video GenerationRealEstate10K >=256 frames (test)
PSNR13.89
6
Interactive World ModelingUE Engine Realistic Game (0~64 frames)
PSNR17.83
3
Interactive World ModelingUE Realistic Game Engine (0~128 frames)
PSNR15.85
3
Interactive World ModelingUE Engine Realistic Game Engine (0~256 frames)
PSNR14.04
3
Interactive World ModelingUE Realistic Game Engine (≥400 frames)
PSNR12.96
3
Interactive World ModelingMinecraft Interactive Gameplay (0~32 frames)
PSNR17.87
3
Interactive World ModelingMinecraft Interactive Gameplay (0~64 frames)
PSNR16.31
3
Showing 10 of 12 rows

Other info

GitHub

Follow for update