Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

About

Recent advances in diffusion-based and controllable video generation have enabled high-quality and temporally coherent video synthesis, laying the groundwork for immersive interactive gaming experiences. However, current methods face limitations in dynamics, generality, long-term consistency, and efficiency, which limit the ability to create various gameplay videos. To address these gaps, we introduce Hunyuan-GameCraft, a novel framework for high-dynamic interactive video generation in game environments. To achieve fine-grained action control, we unify standard keyboard and mouse inputs into a shared camera representation space, facilitating smooth interpolation between various camera and movement operations. Then we propose a hybrid history-conditioned training strategy that extends video sequences autoregressively while preserving game scene information. Additionally, to enhance inference efficiency and playability, we achieve model distillation to reduce computational overhead while maintaining consistency across long temporal sequences, making it suitable for real-time deployment in complex interactive environments. The model is trained on a large-scale dataset comprising over one million gameplay recordings across over 100 AAA games, ensuring broad coverage and diversity, then fine-tuned on a carefully annotated synthetic dataset to enhance precision and control. The curated game scene data significantly improves the visual fidelity, realism and action controllability. Extensive experiments demonstrate that Hunyuan-GameCraft significantly outperforms existing models, advancing the realism and playability of interactive game video generation.

Jiaqi Li, Junshu Tang, Zhiyong Xu, Longhuang Wu, Yuan Zhou, Shuai Shao, Tianbao Yu, Zhiguo Cao, Qinglin Lu• 2025

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench
Quality Score60.1
126
Video GenerationVBench Long--
23
Action-controlled Video GenerationWorldPlay Short-term 61 frames (test)
PSNR21.05
9
Action-controlled Video GenerationWorldPlay Long-term ≥ 250 frames (test)
PSNR10.09
9
Controllable Video GenerationLongVGenBench (test)
Appearance Quality (A.Q.)56.18
8
Single-event Scene Revisit (Different Pose)LiveBench
DINO Feature Similarity (FG)0.475
8
Single-event Scene Revisit (Same Pose)LiveBench
PSNR (Background)17.637
8
View SynthesisViewBench 75 deg
PSNR15.19
6
Video GenerationVBench-Long User Study
Video Quality10.7
6
View SynthesisViewBench 30 deg
PSNR17.04
6
Showing 10 of 20 rows

Other info

Follow for update