RGB2Point: 3D Point Cloud Generation from Single RGB Images
About
We introduce RGB2Point, an unposed single-view RGB image to a 3D point cloud generation based on Transformer. RGB2Point takes an input image of an object and generates a dense 3D point cloud. Contrary to prior works based on CNN layers and diffusion denoising approaches, we use pre-trained Transformer layers that are fast and generate high-quality point clouds with consistent quality over available categories. Our generated point clouds demonstrate high quality on a real-world dataset, as evidenced by improved Chamfer distance (51.15%) and Earth Mover's distance (45.96%) metrics compared to the current state-of-the-art. Additionally, our approach shows a better quality on a synthetic dataset, achieving better Chamfer distance (39.26%), Earth Mover's distance (26.95%), and F-score (47.16%). Moreover, our method produces 63.1% more consistent high-quality results across various object categories compared to prior works. Furthermore, RGB2Point is computationally efficient, requiring only 2.3GB of VRAM to reconstruct a 3D point cloud from a single RGB image, and our implementation generates the results 15,133x faster than a SOTA diffusion-based model.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Single-view Reconstruction | Pix3D | CD0.0706 | 11 | |
| Single-view Point Cloud Reconstruction | ShapeNet R2N2 | CD (Car)4.22 | 9 | |
| Single-image point cloud reconstruction | ShapeNet R2N2 | F-Score @ 1% (Airplane)58.3 | 5 | |
| Sketch-to-3D shape generation | ShapeNet-Sketch v1 (test) | EMD (Chair)2.2 | 5 | |
| Point cloud generation | ShapeNet R2N2 | Generation Time (ms/sample)28.39 | 4 | |
| Object Detection | MAN TruckScenes | mAP0.00e+0 | 3 | |
| Radar point cloud generation | MAN TruckScenes Foreground 1.0 (val) | Hit Rate37 | 2 |