CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos
About
Modern text-to-video (T2V) diffusion models can synthesize visually compelling clips, yet they remain brittle at fine-scale structure: even state-of-the-art generators often produce distorted faces and hands, warped backgrounds, and temporally inconsistent motion. Such severe structural artifacts also appear in very low-quality real-world videos. Classical video restoration and super-resolution (VR/VSR) methods, in contrast, are tuned for synthetic degradations such as blur and downsampling and tend to stabilize these artifacts rather than repair them, while diffusion-prior restorers are usually trained on photometric noise and offer little control over the trade-off between perceptual quality and fidelity. We introduce CreativeVR, a diffusion-prior-guided video restoration framework for AI-generated (AIGC) and real videos with severe structural and temporal artifacts. Our deep-adapter-based method exposes a single precision knob that controls how strongly the model follows the input, smoothly trading off between precise restoration on standard degradations and stronger structure- and motion-corrective behavior on challenging content. Our key novelty is a temporally coherent degradation module used during training, which applies carefully designed transformations that produce realistic structural failures. To evaluate AIGC-artifact restoration, we propose the AIGC54 benchmark with FIQA, semantic and perceptual metrics, and multi-aspect scoring. CreativeVR achieves state-of-the-art results on videos with severe artifacts and performs competitively on standard video restoration benchmarks, while running at practical throughput (about 13 FPS at 720p on a single 80-GB A100). Project page: https://daveishan.github.io/creativevr-webpage/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Super-Resolution | SPMCS (test) | Avg. PSNR26.286 | 36 | |
| Video Restoration | REDS30 | PSNR25.51 | 17 | |
| Video Restoration | REDS30 Spatial Downsampling | PSNR27.12 | 10 | |
| Video Restoration | REDS Spatio-Temporal Light 30 | PSNR21.18 | 10 | |
| Video Restoration | REDS30 Spatio-Temporal Strong | PSNR20.95 | 10 | |
| Video Restoration | YouHQ40 Spatial Downsampling | PSNR27.2 | 10 | |
| Video Restoration | YouHQ40 Spatio-Temporal Downsampling | PSNR25.84 | 10 | |
| Video Restoration | UDM10 (test) | PSNR29.68 | 10 | |
| Video Restoration | YouHQ40 Spatio-Temporal Light | PSNR21.9 | 10 | |
| Video Restoration | YouHQ40 Spatio-Temporal Strong | PSNR21.92 | 10 |