Training-Free Robot Pose Estimation using Off-the-Shelf Foundational Models

About

Pose estimation of a robot arm from visual inputs is a challenging task. However, with the increasing adoption of robot arms for both industrial and residential use cases, reliable joint angle estimation can offer improved safety and performance guarantees, and also be used as a verifier to further train robot policies. This paper introduces using frontier vision-language models (VLMs) as an ``off-the-shelf" tool to estimate a robot arm's joint angles from a single target image. By evaluating frontier VLMs on both synthetic and real-world image-data pairs, this paper establishes a performance baseline attained by current FLMs. In addition, this paper presents empirical results suggesting that test time scaling or parameter scaling alone does not lead to improved joint angle predictions.

Laurence Liang• 2025

Related benchmarks

Task	Dataset	Result
Panda Arm Pose Estimation	DREAM Mini panda_orb_full_view	--	3
Panda Arm Pose Estimation	DREAM Mini panda_sim_full_view	--	2
Panda Arm Pose Estimation	DREAM-Mini panda realsense full view	--	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord