ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
About
Camera control has been actively studied in text or image conditioned video generation tasks. However, altering camera trajectories of a given video remains under-explored, despite its importance in the field of video creation. It is non-trivial due to the extra constraints of maintaining multiple-frame appearance and dynamic synchronization. To address this, we present ReCamMaster, a camera-controlled generative video re-rendering framework that reproduces the dynamic scene of an input video at novel camera trajectories. The core innovation lies in harnessing the generative capabilities of pre-trained text-to-video models through a simple yet powerful video conditioning mechanism--its capability is often overlooked in current research. To overcome the scarcity of qualified training data, we construct a comprehensive multi-camera synchronized video dataset using Unreal Engine 5, which is carefully curated to follow real-world filming characteristics, covering diverse scenes and camera movements. It helps the model generalize to in-the-wild videos. Lastly, we further improve the robustness to diverse inputs through a meticulously designed training strategy. Extensive experiments show that our method substantially outperforms existing state-of-the-art approaches. Our method also finds promising applications in video stabilization, super-resolution, and outpainting. Our code and dataset are publicly available at: https://github.com/KwaiVGI/ReCamMaster.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | VBench | -- | 102 | |
| View Synchronization | Basic Benchmark (test) | FVD675.4 | 20 | |
| Camera Controllability | RealEstate10K (test) | mRotErr1.935 | 10 | |
| Multi-shot Video Generation | 90 prompts evaluation suite | Type Accuracy3.33 | 9 | |
| Stereo Image Conversion | Marvel-10K | PSNR30.44 | 8 | |
| Stereo Video Conversion | Marvel-10K | PSNR30.41 | 8 | |
| Camera control | UltraVideo (test) | DINO0.0504 | 7 | |
| Stereo Video Synthesis | Stereo4D Parallel Format | MS-SSIM52.5 | 7 | |
| Video Novel View Synthesis | Synthetic multi-camera video dataset (test) | Refinement Error (Iter 1 vs 2)0.1181 | 6 | |
| Cinematic Video Generation | Scene-Decoupled Video Dataset (test) | CLIP-T30.86 | 6 |