AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
About
Powerful priors allow us to perform inference with insufficient information. In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. We model the distribution over 3D shapes as a non-sequential autoregressive distribution over a discretized, low-dimensional, symbolic grid-like latent representation of 3D shapes. This enables us to represent distributions over 3D shapes conditioned on information from an arbitrary set of spatially anchored query locations and thus perform shape completion in such arbitrary settings (e.g., generating a complete chair given only a view of the back leg). We also show that the learned autoregressive prior can be leveraged for conditional tasks such as single-view reconstruction and language-based generation. This is achieved by learning task-specific naive conditionals which can be approximated by light-weight models trained on minimal paired data. We validate the effectiveness of the proposed method using both quantitative and qualitative evaluation and show that the proposed method outperforms the specialized state-of-the-art methods trained for individual tasks. The project page with code and video visualizations can be found at https://yccyenchicheng.github.io/AutoSDF/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Single-view 3D Reconstruction | Pix3D (test) | IoU0.521 | 16 | |
| Single-view 3D Object Reconstruction | ShapeNet (test) | -- | 10 | |
| Text-to-Shape Generation | ShapeNet13 | FID7.33e+3 | 9 | |
| Single-view 3D Reconstruction | ShapeNet chairs | Chamfer Distance (CD)25 | 8 | |
| Language-guided 3D shape generation | ShapeNet (test) | P(Tr)0.66 | 7 | |
| Shape completion | ShapeNet v1 (test) | UHD0.0567 | 6 | |
| Single-view Reconstruction | Pix3D | CD2.267 | 5 | |
| Recursive Text-conditioned 3D Shape Generation | ShapeGlot [1, 2] phrases | CLIP-S Score45.72 | 4 | |
| Text-to-Shape Generation | Text2Shape | Accuracy83.88 | 4 | |
| Shape completion | ShapeNet (test) | UHD0.0567 | 3 |