Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations
About
We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects, in which no image-pose pairs or foreground masks are used for training. Though photorealistic images of articulated objects can be rendered with explicit pose control through existing 3D neural representations, these methods require ground truth 3D pose and foreground masks for training, which are expensive to obtain. We obviate this need by learning the representations with GAN training. The generator is trained to produce realistic images of articulated objects from random poses and latent vectors by adversarial training. To avoid a high computational cost for GAN training, we propose an efficient neural representation for articulated objects based on tri-planes and then present a GAN-based framework for its unsupervised training. Experiments demonstrate the efficiency of our method and show that GAN-based training enables the learning of controllable 3D representations without paired supervision.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Generation | DeepFashion (test) | FID77.03 | 9 | |
| 3D Human Generation | SHHQ (test) | FID80.54 | 7 | |
| Controllable Human Avatar Generation | DeepFashion | FID68.62 | 5 | |
| Controllable Human Avatar Generation | UBC | FID36.39 | 5 | |
| Controllable Human Avatar Generation | UBC 68 | Exp Score10.7 | 4 | |
| Controllable Human Avatar Generation | SHHQ 18 | Exp Accuracy14.51 | 4 | |
| 3D Human Synthesis | DeepFashion | RGB Fidelity0.00e+0 | 4 | |
| 3D Human Synthesis | MPV | RGB Score0.00e+0 | 4 | |
| 3D Human Synthesis | UBC | RGB Fidelity0.6 | 4 | |
| 3D Human Synthesis | SHHQ | RGB Fidelity0.00e+0 | 4 |