V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

About

Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages them for video synthesis, and supports editable intrinsic representations remains unexplored. We present V-RGBX, the first end-to-end framework for intrinsic-aware video editing. V-RGBX unifies three key capabilities: (1) video inverse rendering into intrinsic channels, (2) photorealistic video synthesis from these intrinsic representations, and (3) keyframe-based video editing conditioned on intrinsic channels. At the core of V-RGBX is an interleaved conditioning mechanism that enables intuitive, physically grounded video editing through user-selected keyframes, supporting flexible manipulation of any intrinsic modality. Extensive qualitative and quantitative results show that V-RGBX produces temporally consistent, photorealistic videos while propagating keyframe edits across sequences in a physically plausible manner. We demonstrate its effectiveness in diverse applications, including object appearance editing and scene-level relighting, surpassing the performance of prior methods.

Ye Fang, Tong Wu, Valentin Deschaintre, Duygu Ceylan, Iliyan Georgiev, Chun-Hao Paul Huang, Yiwei Hu, Xuelin Chen, Tuanfeng Yang Wang• 2025

Related benchmarks

Task	Dataset	Result
Forward Rendering (X to RGB)	Indoor Synthetic Dataset	PSNR22.42	4
Albedo Estimation	Synthetic Indoor Dataset (test)	PSNR17.73	3
Normal estimation	Synthetic Indoor Dataset (test)	PSNR21.59	3
RGB -> X -> RGB cycle consistency	Evermotion	PSNR22.57	3
RGB -> X -> RGB cycle consistency	RealEstate10K	PSNR17.88	3
Irradiance Estimation	Synthetic Indoor Dataset (test)	PSNR19.94	2

Showing 6 of 6 rows

Other info

GitHub

Follow for update

@wizwand_team Discord