Light-X: Generative 4D Video Rendering with Camera and Illumination Control
About
Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective disentanglement and guide high-quality illumination. 2) To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping that synthesizes training pairs from in-the-wild monocular footage. This strategy yields a dataset covering static, dynamic, and AI-generated scenes, ensuring robust training. Extensive experiments show that Light-X outperforms baseline methods in joint camera-illumination control and surpasses prior video relighting methods under both text- and background-conditioned settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 4D Video Relighting | 100-video benchmark (test) | PSNR (Frame)13.831 | 15 | |
| Joint camera-illumination control | Joint Camera-Illumination Control Benchmark | Aesthetic Score0.623 | 10 | |
| Foreground Video Relighting | Background image-conditioned foreground video relighting dataset (test) | Aesthetic Score0.682 | 5 | |
| Joint camera-illumination control | 200 real in-the-wild videos (Pexels, Sora, Kling) | PSNR13.96 | 5 | |
| Joint camera-illumination control | Real in-the-wild videos (test) | Aesthetic Score0.623 | 5 | |
| Joint camera-illumination control | iPhone multi-view dataset (test) | Aesthetic Score0.557 | 5 | |
| Video Relighting | Video Relighting Dataset (test) | Aesthetic Score0.645 | 4 | |
| 4D Video Relighting | User Study 30° viewpoint | Prompt Match Score4.2 | 4 | |
| 4D Video Relighting | User Study 90° viewpoint | Prompt Match4 | 4 | |
| 4D Video Relighting | User Study 180° viewpoint | Prompt Match Score3.6 | 4 |