Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

About

Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Further, only few works consider the compositional nature of scenes. Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis. Representing scenes as compositional generative neural feature fields allows us to disentangle one or multiple objects from the background as well as individual objects' shapes and appearances while learning from unstructured and unposed image collections without any additional supervision. Combining this scene representation with a neural rendering pipeline yields a fast and realistic image synthesis model. As evidenced by our experiments, our model is able to disentangle individual objects and allows for translating and rotating them in the scene as well as changing the camera pose.

Michael Niemeyer, Andreas Geiger• 2020

Related benchmarks

TaskDatasetResultRank
Unconditional image synthesisFFHQ 256x256 (test)
FID31.2
31
3D Object DetectionKITTI (val)
AP3D R40 Easy20.79
24
Image GenerationFFHQ 256x256 (train)
FID32.6
20
Unconditional image synthesisAFHQ 256x256 (test)
FID33.5
12
3D-aware Image SynthesisFFHQ
FID34.6
10
3D-aware Image SynthesisCelebA-HQ
FID36
8
Pose-controlled face generationFFHQ (test)
FID35
8
Geometry GenerationWOD-Vehicle (val)
Consistency15.87
6
Image GenerationWOD-Vehicle (val)
FID105.3
6
Image SynthesisAFHQ
FID212.7
6
Showing 10 of 26 rows

Other info

Follow for update