| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Video Generation | HunyuanVideo | LPIPS0 | 22 | |
| Text-to-Video generation | HunyuanVideo 13B CFG = 6.0, 720 × 1280p, frames = 60 (test) | CLIPSIM0.184 | 11 | |
| Video Generation | HunyuanVideo 13B (test) | CLIPSIM0.184 | 11 | |
| Text-to-video generation | HunyuanVideo 544p × 860p, 17 frames | VBench Score82.45 | 9 | |
| Text-to-Video Generation | HunyuanVideo VBench prompts (test) | PSNR32.39 | 8 | |
| Text-to-Video Generation | HunyuanVideo (test) | Quality Score83.6 | 8 | |
| Video Generation | HunyuanVideo 480P, 65 frames 1.0 (test) | VBench Score80.66 | 7 | |
| Text-to-video generation | HunyuanVideo 480p × 640p, 45 frames | VBench (%)80.14 | 7 | |
| Video Generation | HunyuanVideo 117 frames (test) | Vision Reward0.15 | 7 | |
| Image-to-Video Generation | HunyuanVideo 1.5 | Q-Save10.05 | 6 | |
| Visual Harmfulness Evaluation | HunyuanVideo | Pornography100 | 4 | |
| Video Generation | HunyuanVideo 129 frames 544P | VBench Score82.48 | 4 | |
| Video Generation | HunyuanVideo 17 frames, 544P | VBench Score82.08 | 4 | |
| Watermark extraction | HunyuanVideo I2V (Temporal Disturbance, 512x16 bits) | Bit Accuracy (Baseline)99 | 3 | |
| Watermark extraction | HunyuanVideo I2V (Spatial Disturbance, 512 bits) | Bit Accuracy (Baseline)100 | 3 |