SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
About
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. Project page: https://yihua7.github.io/SC-GS-web/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | D-NeRF synthetic (test) | Average PSNR43.31 | 42 | |
| Rendering Performance | TUM | Quality Score (fr3/sit_xyz)21.45 | 30 | |
| Novel View Synthesis | NeRF-DS | Average PSNR24.1 | 26 | |
| Novel View Synthesis | HyperNeRF (test) | PSNR26.95 | 18 | |
| Dynamic View Synthesis | DyCheck 5 scenes, 1x resolution 1.0 (test) | mLPIPS0.49 | 11 | |
| Novel View Rendering | N3DV Sear Steak | PSNR28.77 | 11 | |
| Novel View Rendering | N3DV Flame Steak | PSNR23.49 | 11 | |
| Novel View Rendering | N3DV Cook Spinach | PSNR17.2 | 11 | |
| Novel View Rendering | N3DV Cut Roast Beef | PSNR6.29 | 11 | |
| Dynamic View Synthesis | NeRF-DS (test) | PSNR22.25 | 10 |