Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning

About

Robust 3D hand reconstruction in egocentric vision is challenging due to depth ambiguity, self-occlusion, and complex hand-object interactions. Prior methods mitigate these issues by scaling training data or adding auxiliary cues, but they often struggle in unseen contexts. We present EgoHandICL, the first in-context learning (ICL) framework for 3D hand reconstruction that improves semantic alignment, visual consistency, and robustness under challenging egocentric conditions. EgoHandICL introduces complementary exemplar retrieval guided by vision-language models (VLMs), an ICL-tailored tokenizer for multimodal context, and a masked autoencoder (MAE)-based architecture trained with hand-guided geometric and perceptual objectives. Experiments on ARCTIC and EgoExo4D show consistent gains over state-of-the-art methods. We also demonstrate real-world generalization and improve EgoVLM hand-object interaction reasoning by using reconstructed hands as visual prompts. Code and data: https://github.com/Nicous20/EgoHandICL

Binzhu Xie, Shi Qiu, Sicheng Zhang, Yinqiao Wang, Hao Xu, Muzammal Naseer, Chi-Wing Fu, Pheng-Ann Heng• 2026

Related benchmarks

TaskDatasetResultRank
3D Hand Mesh ReconstructionEgoExo4D General Setting
MPJPE21.1
5
3D Hand Mesh ReconstructionEgoExo4D Bimanual Setting
P-MPJPE7.5
5
3D Hand Mesh ReconstructionARCTIC General Setting
P-MPJPE4
5
3D Hand Mesh ReconstructionARCTIC Bimanual Setting
P-MPVPE3.7
5
Showing 4 of 4 rows

Other info

Follow for update