Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies

About

Precise human mesh recovery (HMR) from multi-view images remains challenging: end-to-end methods produce entangled errors hard to localize, while fitting-based methods rely on sparse keypoints that provide limited surface constraints. We observe that the true bottleneck lies in the quality of intermediate representations, and that dense pixel-to-surface correspondences can be effectively generated by repurposing pre-trained diffusion models with rich visual priors. We propose DiffProxy, a Stable-Diffusion-based framework trained on large-scale synthetic data with pixel-perfect annotations. A multi-conditional proxy generator predicts dense correspondences from multi-view images, providing uniform surface constraints that enable precise fitting. Hand refinement feeds enlarged hand crops alongside full-body images for fine-grained detail, while test-time scaling exploits diffusion stochasticity to estimate per-pixel uncertainty. Trained only on synthetic data, DiffProxy achieves state-of-the-art results on five diverse real-world benchmarks. Project page: https://wrk226.github.io/DiffProxy.html

Renke Wang, Zhenyu Zhang, Ying Tai, Jun Li, Jian Yang• 2026

Related benchmarks

TaskDatasetResultRank
Human Mesh RecoveryMPI-INF-3DHP
MPJPE42
35
Human Mesh RecoveryMoYo
MPJPE29.1
16
Human Mesh RecoveryRICH
PA-MPVPE27.6
13
Human Mesh RecoveryBEHAVE
PA-MPJPE22.7
7
Human Mesh Recovery4D-DRESS
PA-MPJPE17.3
7
Human Mesh Recovery4D-DRESS partial
PA-MPJPE22.7
7
Showing 6 of 6 rows

Other info

GitHub

Follow for update