Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation
About
In this paper, we address the task of semantic-guided scene generation. One open challenge in scene generation is the difficulty of the generation of small objects and detailed local texture, which has been widely observed in global image-level generation methods. To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details. To learn more discriminative class-specific feature representations for the local generation, a novel classification module is also proposed. To combine the advantage of both the global image-level and the local class-specific generation, a joint generation network is designed with an attention fusion module and a dual-discriminator structure embedded. Extensive experiments on two scene image generation tasks show superior generation performance of the proposed model. The state-of-the-art results are established by large margins on both tasks and on challenging public benchmarks. The source code and trained models are available at https://github.com/Ha0Tang/LGGAN.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic Image Synthesis | ADE20K | FID31.6 | 66 | |
| Semantic Image Synthesis | Cityscapes | FID57.7 | 54 | |
| Semantic Image Synthesis | ADE20K (val) | FID31.6 | 47 | |
| Semantic Image Synthesis | Cityscapes (val) | mIoU68.4 | 15 | |
| Aerial-to-Ground Image Translation | CVUSA (test) | Top-1 Accuracy44.75 | 10 | |
| Cross-view Image Translation (aerial-to-ground) | Dayton (test) | Top-1 Accuracy48.17 | 9 | |
| Semantic Image Synthesis | Cityscapes | AMT67.38 | 4 |