Image Synthesis From Reconfigurable Layout and Style
About
Despite remarkable recent progress on both unconditional and conditional image synthesis, it remains a long-standing problem to learn generative models that are capable of synthesizing realistic and sharp images from reconfigurable spatial layout (i.e., bounding boxes + class labels in an image lattice) and style (i.e., structural and appearance variations encoded by latent vectors), especially at high resolution. By reconfigurable, it means that a model can preserve the intrinsic one-to-many mapping from a given layout to multiple plausible images with different styles, and is adaptive with respect to perturbations of a layout and style latent code. In this paper, we present a layout- and style-based architecture for generative adversarial networks (termed LostGANs) that can be trained end-to-end to generate images from reconfigurable layout and style. Inspired by the vanilla StyleGAN, the proposed LostGAN consists of two new components: (i) learning fine-grained mask maps in a weakly-supervised manner to bridge the gap between layouts and images, and (ii) learning object instance-specific layout-aware feature normalization (ISLA-Norm) in the generator to realize multi-object style generation. In experiments, the proposed method is tested on the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained. The code and pretrained models are available at \url{https://github.com/iVMCL/LostGANs}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Layout-to-Image Synthesis | Visual Genome (VG) (test) | FID29.36 | 35 | |
| Layout-to-Image Synthesis | Coco-Stuff (test) | FID29.65 | 25 | |
| Layout-to-Image Generation | COCO Stuff | FID29.65 | 23 | |
| Layout-to-Image Generation | Visual Genome | FID29.36 | 20 | |
| Object Detection | nuImages | mAP35.6 | 20 | |
| Layout-to-Image Generation | COCO | SceneFID20.03 | 6 | |
| Layout-to-Image Generation | VG | SceneFID13.17 | 5 | |
| Remote Sensing Image Generation | DIOR | FID57.1 | 5 | |
| Object Classification | COCO Stuff | Accuracy28.81 | 4 | |
| Object Classification | Visual Genome | Accuracy27.5 | 4 |