CoRe3D: Collaborative Reasoning as a Foundation for 3D Intelligence

About

Recent advances in large multimodal models suggest that explicit reasoning mechanisms play a critical role in improving model reliability, interpretability, and cross-modal alignment. While such reasoning-centric approaches have been proven effective in language and vision tasks, their extension to 3D remains underdeveloped. CoRe3D introduces a unified 3D understanding and generation reasoning framework that jointly operates over semantic and spatial abstractions, enabling high-level intent inferred from language to directly guide low-level 3D content formation. Central to this design is a spatially grounded reasoning representation that decomposes 3D latent space into localized regions, allowing the model to reason over geometry in a compositional and procedural manner. By tightly coupling semantic chain-of-thought inference with structured spatial reasoning, CoRe3D produces 3D outputs that exhibit strong local consistency and faithful alignment with linguistic descriptions.

Tianjiao Yu, Xinzhuo Li, Yifan Shen, Yuanzhe Liu, Ismini Lourentzou• 2025

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy67.6	844
Physical Commonsense Reasoning	PIQA	Accuracy79.4	696
Social Commonsense Reasoning	SIQA	Accuracy41.5	112
3D Object Captioning	Objaverse (held-out set)	BLEU-124.02	7
Image-to-3D	Objaverse	CLIP Score0.86	5
Text-to-3D	Objaverse	CLIP Score0.3	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord