CausalGS: Learning Physical Causality of 3D Dynamic Scenes with Gaussian Representations
About
Learning a physical model from video data that can comprehend physical laws and predict the future trajectories of objects is a formidable challenge in artificial intelligence. Prior approaches either leverage various Partial Differential Equations (PDEs) as soft constraints in the form of PINN losses, or integrate physics simulators into neural networks; however, they often rely on strong priors or high-quality geometry reconstruction. In this paper, we propose CausalGS, a framework that learns the causal dynamics of complex dynamic 3D scenes solely from multi-view videos, while dispensing with the reliance on explicit priors. At its core is an inverse physics inference module that decouples the complex dynamics problem from the video into the joint inference of two factors: the initial velocity field representing the scene's kinematics, and the intrinsic material properties governing its dynamics. This inferred physical information is then utilized within a differentiable physics simulator to guide the learning process in a physics-regularized manner. Extensive experiments demonstrate that CausalGS surpasses the state-of-the-art on the highly challenging task of long-term future frame extrapolation, while also exhibiting advanced performance in novel view interpolation. Crucially, our work shows that, without any human annotation, the model is able to learn the complex interactions between multiple physical properties and understand the causal relationships driving the scene's dynamic evolution, solely from visual observations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Future frame extrapolation | Dynamic Indoor Scene Dataset | PSNR36.748 | 24 | |
| Novel view interpolation | Dynamic Indoor Scene Dataset | PSNR33.888 | 22 | |
| Future frame extrapolation | Dynamic Object Dataset | PSNR34.517 | 22 | |
| Novel view interpolation | Dynamic Object Dataset | PSNR40.002 | 20 | |
| Future frame extrapolation | NVIDIA Dynamic Scene Skating | PSNR29.665 | 12 | |
| Novel view interpolation | NVIDIA Dynamic Scene Truck | PSNR28.924 | 12 | |
| Future frame extrapolation | NVIDIA Dynamic Scene Truck | PSNR30.104 | 12 | |
| Novel view interpolation | NVIDIA Dynamic Scene Skating | PSNR28.583 | 12 | |
| Unsupervised Object Segmentation | synthetic indoor scene dataset | AP99.82 | 7 | |
| Future frame extrapolation | FreeGave-GoPro | PSNR28.267 | 6 |