CAT3D: Create Anything in 3D with Multi-View Diffusion Models
About
Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | LLFF | PSNR25.63 | 124 | |
| Novel View Synthesis | RealEstate10K | PSNR32.2 | 116 | |
| Novel View Synthesis | Mip-NeRF360 | PSNR18.67 | 104 | |
| Novel View Synthesis | DTU | PSNR25.92 | 100 | |
| Novel View Synthesis | Tanks&Temples | PSNR12.525 | 52 | |
| Novel View Synthesis | CO3D | PSNR23.58 | 24 | |
| Few-view 3D Reconstruction | RealEstate10K (test) | PSNR32.2 | 20 | |
| Few-view 3D Reconstruction | LLFF (out-of-distribution) | PSNR25.63 | 12 | |
| Few-view 3D Reconstruction | DTU (out-of-distribution) | PSNR25.92 | 12 | |
| Few-view 3D Reconstruction | Co3D (test) | PSNR23.58 | 12 |