Articulation in Prime: Primitive-Based Articulated Object Understanding from a Single Casual Video

About

Retrieving the 3D kinematics of articulated objects from monocular video is a fundamental challenge in computer vision. Existing methods rely on complex video setups or cues such as long-term point tracking or wide-baseline matching, but are frequently brittle under severe occlusions, rapid camera ego-motion, or weak local features. Learning-based methods, meanwhile, struggle to generalize beyond their training categories. We propose a category-agnostic optimization framework that treats articulated object understanding as a primitive-fitting problem. Geometric primitives serve as a proxy representation that avoids the pitfalls of unstable point tracks; a novel mechanism organizes them into coherent parts constrained by revolute and prismatic joints. Our formulation jointly optimizes part segmentation and joint parameters, recovering complex kinematics from a single casually captured video. A visibility-aware procedure handles partial observations and occlusions inherent to real-world data. We also propose the AiP-synth and AiP-real benchmarks, featuring significant camera motion and heavy occlusions, and outperform existing methods. Project page: https://aartykov.github.io/Articulation-in-Prime/

Arslan Artykov, Tom Ravaud, Nicol\'as Violante-Grezzi, Vincent Lepetit• 2026

Related benchmarks

Task	Dataset	Result
Articulated Object Modeling	AiP real	Axis Alignment Error (Deg)1.13	22
Articulation Estimation	Video2Articulation Revolute S (test)	Axis Error (°)0.00e+0	4
Articulation Estimation	Arti4D (test)	Axis Error3.6	4
Articulation Estimation	Video2Articulation Prismatic S (test)	Axis Accuracy (°)0.00e+0	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord