StarGAN v2: Diverse Image Synthesis for Multiple Domains
About
A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain differences. The code, pretrained models, and dataset can be found at https://github.com/clovaai/stargan-v2.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-to-Image Translation | Retinal Fundus-to-Angiogram (test) | FID26.7 | 42 | |
| Image-to-Image Translation | CelebA-HQ | FID32.16 | 28 | |
| Unpaired Image-to-Image Translation | Cat → Dog v1 (test) | FID54.88 | 14 | |
| Reference-guided image synthesis | AFHQ (test) | FID19.78 | 13 | |
| Reference-guided image synthesis | CelebA-HQ (test) | FID19.58 | 12 | |
| Segmentation | Chest MRI to CT | Accuracy90.7 | 10 | |
| Segmentation | Retinal OCT | Accuracy75.4 | 10 | |
| Segmentation | Cardiac MRI | Accuracy94.4 | 10 | |
| latent-guided image synthesis | CelebA-HQ | FID13.7 | 9 | |
| Image Synthesis | Retinal OCT (test) | FID174.2 | 9 |