UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation

About

Estimating the 3D pose of hand and potential hand-held object from monocular images is a longstanding challenge. Yet, existing methods are specialized, focusing on either bare-hand or hand interacting with object. No method can flexibly handle both scenarios and their performance degrades when applied to the other scenario. In this paper, we propose UniHOPE, a unified approach for general 3D hand-object pose estimation, flexibly adapting both scenarios. Technically, we design a grasp-aware feature fusion module to integrate hand-object features with an object switcher to dynamically control the hand-object pose estimation according to grasping status. Further, to uplift the robustness of hand pose estimation regardless of object presence, we generate realistic de-occluded image pairs to train the model to learn object-induced hand occlusions, and formulate multi-level feature enhancement techniques for learning occlusion-invariant features. Extensive experiments on three commonly-used benchmarks demonstrate UniHOPE's SOTA performance in addressing hand-only and hand-object scenarios. Code will be released on https://github.com/JoyboyWang/UniHOPE_Pytorch.

Yinqiao Wang, Hao Xu, Pheng-Ann Heng, Chi-Wing Fu• 2025

Related benchmarks

Task	Dataset	Result
Hand Pose Estimation	HO-3D (test)	Joint Error (mm)9.6	53
Hand Pose Estimation	DexYCB (S0)	J-PE12.42	36
Hand Pose Estimation	DexYCB S3 (test)	J-PE12.59	36
Hand Pose Estimation	DexYCB (S1)	J-PE16.31	36
Hand Pose Estimation	FreiHAND	J-PE13.53	24
Hand-Object Pose Estimation	HO3D v2 (test)	STMJE25.5	20
Hand Pose Estimation	HO-3D v2 (test)	F-score @ 5mm24.64	16
Hand-Object Pose Estimation	DexYCB (S0)	MJE12.6	12
Hand Pose Estimation	DexYCB (S3 split)	MJE13.31	9
Object Pose Estimation	DexYCB (S3)	ADD-0.5D (gelatin_box)26.23	8

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord