Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

About

Interactive 3D simulated objects are crucial in AR/VR, animations, and robotics, driving immersive experiences and advanced automation. However, creating these articulated objects requires extensive human effort and expertise, limiting their broader applications. To overcome this challenge, we present Articulate-Anything, a system that automates the articulation of diverse, complex objects from many input modalities, including text, images, and videos. Articulate-Anything leverages vision-language models (VLMs) to generate code that can be compiled into an interactable digital twin for use in standard 3D simulators. Our system exploits existing 3D asset datasets via a mesh retrieval mechanism, along with an actor-critic system that iteratively proposes, evaluates, and refines solutions for articulating the objects, self-correcting errors to achieve a robust outcome. Qualitative evaluations demonstrate Articulate-Anything's capability to articulate complex and even ambiguous object affordances by leveraging rich grounded inputs. In extensive quantitative experiments on the standard PartNet-Mobility dataset, Articulate-Anything substantially outperforms prior work, increasing the success rate from 8.7-11.6% to 75% and setting a new bar for state-of-the-art performance. We further showcase the utility of our system by generating 3D assets from in-the-wild video inputs, which are then used to train robotic policies for fine-grained manipulation tasks in simulation that go beyond basic pick and place. These policies are then transferred to a real robotic system.

Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, Eric Eaton• 2024

Related benchmarks

TaskDatasetResultRank
ReconstructionVideo2Articulation-S
CD-w0.11
8
Single-View Articulated Object GenerationACD
RS dgIoU126.8
7
Part-Decomposed Single-View Articulated Object GenerationPartNet-Mobility (test)
RS-gIoU0.6865
7
Free-moving Articulated Object ReconstructionFreeArt-21 Revolute 1.0 (test)
Axis Alignment Error (deg)42
7
Free-moving Articulated Object ReconstructionFreeArt-21 Prismatic 1.0 (test)
Axis Error (deg)45
7
Joint parameter estimationDiverse 3D Assets (test)
Type Error0.21
6
Physical ExecutabilityDiverse 3D Assets (test)
Executability46
6
Part SegmentationDiverse 3D Assets (test)
mIoU0.47
6
3D Articulation and Geometry EstimationSIMART-Bench (In-Domain Items)
Type Score89.1
5
3D Articulation and Geometry EstimationSIMART-Bench AI-generated Items (Out-of-Distribution)
Type Accuracy76.5
5
Showing 10 of 24 rows

Other info

Follow for update