Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts

About

We humans rely on a wide range of commonsense knowledge to interact with an extensive number and categories of objects in the physical world. Likewise, such commonsense knowledge is also crucial for robots to successfully develop generalized object manipulation skills. While recent advancements in Multi-modal Large Language Models (MLLMs) have showcased their impressive capabilities in acquiring commonsense knowledge and conducting commonsense reasoning, effectively grounding this semantic-level knowledge produced by MLLMs to the physical world to thoroughly guide robots in generalized articulated object manipulation remains a challenge that has not been sufficiently addressed. To this end, we introduce analytic concepts, procedurally defined upon mathematical symbolism that can be directly computed and simulated by machines. By leveraging the analytic concepts as a bridge between the semantic-level knowledge inferred by MLLMs and the physical world where real robots operate, we can figure out the knowledge of object structure and functionality with physics-informed representations, and then use the physically grounded knowledge to instruct robot control policies for generalized and accurate articulated object manipulation. Extensive experiments in both real world and simulation demonstrate the superiority of our approach.

Jiude Wei, Yuxuan Li, Cewu Lu, Jianhua Sun• 2025

Related benchmarks

Task	Dataset	Result
Articulated Object Manipulation	PartNet-mobility Categories v1 (test)	Bkt Score28.9	6
Articulated Object Manipulation	PartNet-mobility v1 (train)	Box15.9	6
Robot Manipulation	Real-world household objects	Box Score90	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord