DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies

About

Precise human mesh recovery (HMR) from multi-view images remains challenging: end-to-end methods produce entangled errors hard to localize, while fitting-based methods rely on sparse keypoints that provide limited surface constraints. We observe that the true bottleneck lies in the quality of intermediate representations, and that dense pixel-to-surface correspondences can be effectively generated by repurposing pre-trained diffusion models with rich visual priors. We propose DiffProxy, a Stable-Diffusion-based framework trained on large-scale synthetic data with pixel-perfect annotations. A multi-conditional proxy generator predicts dense correspondences from multi-view images, providing uniform surface constraints that enable precise fitting. Hand refinement feeds enlarged hand crops alongside full-body images for fine-grained detail, while test-time scaling exploits diffusion stochasticity to estimate per-pixel uncertainty. Trained only on synthetic data, DiffProxy achieves state-of-the-art results on five diverse real-world benchmarks. Project page: https://wrk226.github.io/DiffProxy.html

Renke Wang, Zhenyu Zhang, Ying Tai, Jun Li, Jian Yang• 2026

Related benchmarks

Task	Dataset	Result
Human Mesh Recovery	MPI-INF-3DHP	MPJPE42	43
Human Mesh Recovery	RICH	MPJPE29.6	19
Human Mesh Recovery	MoYo	MPJPE29.1	16
Human Mesh Recovery	BEHAVE	PA-MPJPE22.7	7
Human Mesh Recovery	4D-DRESS	PA-MPJPE17.3	7
Human Mesh Recovery	4D-DRESS partial	PA-MPJPE22.7	7
3D Human Body Fitting	4D-DRESS (test)	V2V Error2.093	4

Showing 7 of 7 rows

Other info

GitHub

Follow for update

@wizwand_team Discord