Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

About

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key innovations. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We also propose Context Forcing, a novel distillation method designed for memory-aware model. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, WorldPlay generates long-horizon streaming 720p video at 24 FPS with superior consistency, comparing favorably with existing techniques and showing strong generalization across diverse scenes. Project page and online demo can be found: https://3d-models.hunyuan.tencent.com/world/ and https://3d.hunyuan.tencent.com/sceneTo3D.

Wenqiang Sun, Haiyu Zhang, Haoyuan Wang, Junta Wu, Zehan Wang, Zhenwei Wang, Yunhong Wang, Jun Zhang, Tengfei Wang, Chunchao Guo• 2025

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench (test)--
66
Video GenerationRealEstate10K (Re10K) (test)
PSNR16.013
16
Video GenerationVBench
AQ0.601
11
Action-controlled Video GenerationWorldPlay Short-term 61 frames (test)
PSNR21.92
9
Action-controlled Video GenerationWorldPlay Long-term ≥ 250 frames (test)
PSNR18.94
9
Long-horizon Video GenerationnuScenes (val)
FID33.51
9
Video World ModelingSTEVO-Bench
State Progress0.00e+0
9
Video Generation55 in-the-wild images Exploration
AQ60.16
8
Long-horizon Video GenerationRealEstate10K Revisiting
PSNR16.31
8
Long-horizon Video GenerationRealEstate10K (Exploration)
AQ49.24
8
Showing 10 of 27 rows

Other info

GitHub

Follow for update