Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image

About

In this paper, we present HumanNOVA, a photorealistic, universal, and rapid model for generating 3D human avatars from a single RGB image. Achieving both photorealism and generalization is challenging due to the scarcity of diverse, high-quality 3D human data. To address this, we build a scalable data generation pipeline that follows two strategies. The first one is to leverage existing rigged assets and animate them with extensive poses from daily life. The second strategy is to utilize existing multi-camera captures of humans and employ fitting to generate more diverse views for training. These two strategies enable us to scale up to 100k assets, significantly enhancing both the quantity and the diversity of data for robust model training. In terms of the architecture, HumanNOVA adopts a feed-forward, token-conditioned avatar modeling framework that allows fast inference in less than one second and requires no test-time optimization. Given an input image and an estimated simplified human mesh (SMPL) without detailed geometry or appearance, the model first encodes both inputs into compact token representations. These tokens then act as conditioning signals and are fused through cross-attention to construct a triplane-based 3D avatar representation. Extensive experiments on multiple benchmarks demonstrate the superiority of our approach, both quantitatively and qualitatively, as well as its robustness under diverse input image conditions. Project page at https://HumanNOVA.github.io .

Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang, Jonathan C. Liu, Zhiwen Fan, Kai Wang, Zhangyang Wang, Georgios Pavlakos• 2026

Related benchmarks

TaskDatasetResultRank
3D human reconstructionCustomHuman (test)
Chamfer Distance (P2G)1.052
16
3D human reconstructionTHuman2 (test)
CD (Pred to GT)1.027
16
3D human reconstruction2K2K (test)
Chamfer Distance (Pred to GT)1.045
16
Single-view 3D Human ReconstructionCustomHuman Frontal view 47
PSNR22.29
8
Single-view 3D Human ReconstructionCustomHuman Side view 47
PSNR22.52
8
Single-view 3D Human ReconstructionTHuman2 Frontal view 74
PSNR23.96
8
Single-view 3D Human ReconstructionTHuman2 Side view 74
PSNR24.35
8
Single-view 3D Human Reconstruction2K2K 17 (Frontal view)
PSNR22.65
8
Single-view 3D Human Reconstruction2K2K Side view 17
PSNR23.07
8
Human Avatar AnimationCustomHuman
PSNR22.29
3
Showing 10 of 10 rows

Other info

Follow for update