Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes

About

Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.

Luoxi Zhang, Chun Xie, Itaru Kitahara• 2026

Related benchmarks

TaskDatasetResultRank
3D Shape ReconstructionPix3D (test)
F-Score62.14
9
Object Reconstruction (Chamfer Distance ↓)Pix3D (test)
Mean CD19.64
5
Object Reconstruction (Normal Consistency ↑)Pix3D (test)
Normal Consistency (NC)80.5
5
Showing 3 of 3 rows

Other info

Follow for update