CoRe3D: Collaborative Reasoning as a Foundation for 3D Intelligence
About
Recent advances in large multimodal models suggest that explicit reasoning mechanisms play a critical role in improving model reliability, interpretability, and cross-modal alignment. While such reasoning-centric approaches have been proven effective in language and vision tasks, their extension to 3D remains underdeveloped. CoRe3D introduces a unified 3D understanding and generation reasoning framework that jointly operates over semantic and spatial abstractions, enabling high-level intent inferred from language to directly guide low-level 3D content formation. Central to this design is a spatially grounded reasoning representation that decomposes 3D latent space into localized regions, allowing the model to reason over geometry in a compositional and procedural manner. By tightly coupling semantic chain-of-thought inference with structured spatial reasoning, CoRe3D produces 3D outputs that exhibit strong local consistency and faithful alignment with linguistic descriptions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Understanding | MMLU | Accuracy67.6 | 756 | |
| Physical Commonsense Reasoning | PIQA | Accuracy79.4 | 329 | |
| Social Commonsense Reasoning | SIQA | Accuracy41.5 | 32 | |
| 3D Object Captioning | Objaverse (held-out set) | BLEU-124.02 | 7 | |
| Image-to-3D | Objaverse | CLIP Score0.86 | 5 | |
| Text-to-3D | Objaverse | CLIP Score0.3 | 5 |