MorpheuS: Neural Dynamic 360{\deg} Surface Reconstruction from Monocular RGB-D Video
About
Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion. To tackle this challenge, we introduce MorpheuS, a framework for dynamic 360{\deg} surface reconstruction from a casually captured RGB-D video. Our approach models the target scene as a canonical field that encodes its geometry and appearance, in conjunction with a deformation field that warps points from the current frame to the canonical space. We leverage a view-dependent diffusion prior and distill knowledge from it to achieve realistic completion of unobserved regions. Experimental results on various real-world and synthetic datasets show that our method can achieve high-fidelity 360{\deg} surface reconstruction of a deformable object from a monocular RGB-D video.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D surface reconstruction | KillingFusion per-scene | Accuracy (cm)0.77 | 6 | |
| 3D surface reconstruction | DeepDeform (per-scene) | Accuracy (cm)0.57 | 6 | |
| 3D surface reconstruction | iPhone (per-scene) | Error (cm)0.77 | 6 | |
| Deformable 3D Reconstruction | AMA | Accuracy (cm)1.71 | 4 | |
| Deformable 3D Reconstruction | BANMo | Acc (cm)2.16 | 4 | |
| Non-rigid reconstruction | KillingFusion (offline) | Depth L1 Error (cm)3.2 | 2 | |
| Non-rigid reconstruction | DeepDeform (offline) | Depth L1 Error (cm)1.9 | 2 | |
| Non-rigid reconstruction | iPhone (offline) | Depth L1 (cm)2.4 | 2 |