NeuMan: Neural Human Radiance Field from a Single Video
About
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model. To train these models, we rely on existing methods to estimate the rough geometry of the human and the scene. Those rough geometry estimates allow us to create a warping field from the observation space to the canonical pose-independent space, where we train the human model in. Our method is able to learn subject specific details, including cloth wrinkles and accessories, from just a 10 seconds video clip, and to provide high quality renderings of the human under novel poses, from novel views, together with the background.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel Pose Synthesis | InterHand Single hand → Single hand 2.6M (test) | PSNR31.8419 | 18 | |
| Novel View Synthesis | ZJU-MoCap novel view setting | PSNR28.96 | 14 | |
| Monocular dynamic scene reconstruction | HOSNeRF (test) | Backpack PSNR21.21 | 12 | |
| Novel Pose Synthesis | ZJU-MoCap (Novel Pose) | PSNR28.75 | 10 | |
| 4D Human Reconstruction | NeuMan Citron sequence (test) | PSNR18.39 | 10 | |
| Scene Reconstruction | NeuMan (test) | PSNR (Seattle)23.99 | 8 | |
| Novel Pose Synthesis | InterHand Single hand → Interacting hands 2.6M (test) | PSNR24.9451 | 6 | |
| Human Reconstruction | NeuMan (test) | PSNR29.32 | 6 | |
| Human Avatar Reconstruction | Neuman Jogging | PSNR17.57 | 5 | |
| Human Avatar Reconstruction | Neuman Seattle | PSNR18.42 | 5 |