SHERF: Generalizable Human NeRF from a Single Image

About

Existing Human NeRF methods for reconstructing 3D humans typically rely on multiple 2D images from multi-view cameras or monocular videos captured from fixed camera views. However, in real-world scenarios, human images are often captured from random camera angles, presenting challenges for high-quality 3D human reconstruction. In this paper, we propose SHERF, the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image. SHERF extracts and encodes 3D human representations in canonical space, enabling rendering and animation from free views and poses. To achieve high-fidelity novel view and pose synthesis, the encoded 3D human representations should capture both global appearance and local fine-grained textures. To this end, we propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding. Global features enhance the information extracted from the single input image and complement the information missing from the partial 2D observation. Point-level features provide strong clues of 3D human structure, while pixel-aligned features preserve more fine-grained details. To effectively integrate the 3D-aware hierarchical feature bank, we design a feature fusion transformer. Extensive experiments on THuman, RenderPeople, ZJU_MoCap, and HuMMan datasets demonstrate that SHERF achieves state-of-the-art performance, with better generalizability for novel view and pose synthesis.

Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu• 2023

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	THuman 2.0 (test)	LPIPS0.11	51
Human Novel View Synthesis	HuMMan	PSNR20.83	9
Human Novel View Synthesis	THuman 2.0	PSNR19.25	9
Novel Pose Synthesis	Thuman	FID37.48	7
Novel Pose Synthesis	RenderPeople	FID38.98	7
Novel View Synthesis	RenderPeople	FID36.54	7
Novel View Synthesis	Thuman	FID37.76	7
Human performance capture	4D-DRESS (test)	PSNR21.86	6
Human Image Synthesis	HuGe100K in-the-wild (test)	User Preference Score28.75	5
Human Motion and View Synthesis	HuGe100K (user study)	Identity & Appearance Preservation13.55	5

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord