Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-View Image Synthesis using Conditional GANs

About

Learning to generate natural scenes has always been a challenging task in computer vision. It is even more painstaking when the generation is conditioned on images with drastically different views. This is mainly because understanding, corresponding, and transforming appearance and semantic information across the views is not trivial. In this paper, we attempt to solve the novel problem of cross-view image synthesis, aerial to street-view and vice versa, using conditional generative adversarial networks (cGAN). Two new architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq) are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels. X-Fork architecture has a single discriminator and a single generator. The generator hallucinates both the image and its semantic segmentation in the target view. X-Seq architecture utilizes two cGANs. The first one generates the target image which is subsequently fed to the second cGAN for generating its corresponding semantic segmentation map. The feedback from the second cGAN helps the first cGAN generate sharper images. Both of our proposed architectures learn to generate natural images as well as their semantic segmentation maps. The proposed methods show that they are able to capture and maintain the true semantics of objects in source and target views better than the traditional image-to-image translation method which considers only the visual appearance of the scene. Extensive qualitative and quantitative evaluations support the effectiveness of our frameworks, compared to two state of the art methods, for natural scene generation across drastically different views.

Krishna Regmi, Ali Borji• 2018

Related benchmarks

TaskDatasetResultRank
Aerial-to-Ground Image SynthesisCVUSA
Top-1 Acc2.6738
23
Aerial-to-Ground Image SynthesisDayton 256x256
Top-1 Accuracy2.2402
22
Aerial-to-Ground Image SynthesisDayton 64x64
Top-1 Accuracy1.5908
18
Ground-to-Aerial Image SynthesisDayton 64x64
Top-1 Accuracy1.3189
18
Ground-to-Aerial Image SynthesisDayton 256x256
Top-1 Accuracy19.62
14
Aerial-to-Ground Image TranslationCVUSA (test)
Top-1 Accuracy34.69
10
Cross-View Image TranslationDayton 64x64 (test)
SSIM0.5031
9
Cross-view Image Translation (aerial-to-ground)Dayton (test)
Top-1 Accuracy34.67
9
Cross-View Image TranslationDayton 256x256 (test)
SSIM0.3708
8
Cross-view Image SynthesisDayton 64x64 (test)
SSIM0.5171
6
Showing 10 of 24 rows

Other info

Follow for update