Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

About

Interactive 3D simulated objects are crucial in AR/VR, animations, and robotics, driving immersive experiences and advanced automation. However, creating these articulated objects requires extensive human effort and expertise, limiting their broader applications. To overcome this challenge, we present Articulate-Anything, a system that automates the articulation of diverse, complex objects from many input modalities, including text, images, and videos. Articulate-Anything leverages vision-language models (VLMs) to generate code that can be compiled into an interactable digital twin for use in standard 3D simulators. Our system exploits existing 3D asset datasets via a mesh retrieval mechanism, along with an actor-critic system that iteratively proposes, evaluates, and refines solutions for articulating the objects, self-correcting errors to achieve a robust outcome. Qualitative evaluations demonstrate Articulate-Anything's capability to articulate complex and even ambiguous object affordances by leveraging rich grounded inputs. In extensive quantitative experiments on the standard PartNet-Mobility dataset, Articulate-Anything substantially outperforms prior work, increasing the success rate from 8.7-11.6% to 75% and setting a new bar for state-of-the-art performance. We further showcase the utility of our system by generating 3D assets from in-the-wild video inputs, which are then used to train robotic policies for fine-grained manipulation tasks in simulation that go beyond basic pick and place. These policies are then transferred to a real robotic system.

Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, Eric Eaton• 2024

Related benchmarks

Task	Dataset	Result
Articulated Object Modeling	AiP real	Axis Alignment Error (Deg)90	22
Articulated Object Reconstruction	PartNet-Mobility (test)	RS-dgIoU0.4731	11
Articulated Object Generation	ACD (test)	RS-dgIoU0.7574	9
Collision Graph Prediction	PartNet-Mobility (test)	AOR0.0025	9
Reconstruction	Video2Articulation-S	CD-w0.11	8
Single-View Articulated Object Generation	ACD	RS dgIoU126.8	7
Part-Decomposed Single-View Articulated Object Generation	PartNet-Mobility (test)	RS-gIoU0.6865	7
Free-moving Articulated Object Reconstruction	FreeArt-21 Revolute 1.0 (test)	Axis Alignment Error (deg)42	7
Free-moving Articulated Object Reconstruction	FreeArt-21 Prismatic 1.0 (test)	Axis Error (deg)45	7
Joint parameter estimation	Diverse 3D Assets (test)	Type Error0.21	6

Showing 10 of 39 rows

Other info

Follow for update

@wizwand_team Discord