HunyuanVideo 1.5 Technical Report
About
We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source video generation models. By releasing the code and model weights, we provide the community with a high-performance foundation that lowers the barrier to video creation and research, making advanced video generation accessible to a broader audience. All open-source assets are publicly available at https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | ImageNet 256x256 | IS69.59 | 517 | |
| Video Generation | VBench | Total Score83.43 | 42 | |
| Video Generation | short videos 81-frames 240 prompts | Total Score6.9 | 38 | |
| Image Reconstruction | ImageNet 256p | PSNR31.18 | 38 | |
| Image Reconstruction | OmniDoc-TokenBench 256x256 (test) | SSIM84.22 | 23 | |
| Image Reconstruction | FFHQ 1k | PSNR37.3 | 21 | |
| Physical Plausibility Evaluation | VideoPhy | Average PC28.2 | 16 | |
| Image-to-Video Generation | VBench 1.5 (test) | I2V Score94.13 | 15 | |
| Implicit Image-to-Video (Implicit I2V) | IntelligentVBench | IF Score3.98 | 12 | |
| Video Reasoning | Sokoban (test) | Precision8.2 | 11 |