FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery
About
We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery. Our code/models/demo can be found at https://github.com/kkanshul/finegan
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained object category discovery | Stanford Cars (test) | Accuracy7.8 | 38 | |
| Image Generation | CUB200 | FID11.25 | 10 | |
| Image Generation | CUB Birds 200-2011 (test) | FID11.25 | 9 | |
| Image Generation | Stanford Cars (test) | Fréchet Inception Distance16.03 | 9 | |
| Fine-grained object category discovery | CUB-200 Birds 2011 (test) | NMI0.403 | 5 | |
| Fine-grained object category discovery | Stanford Dogs (test) | NMI0.233 | 5 | |
| Image Generation | Stanford Dogs (test) | IS46.92 | 5 | |
| Image Generation | Stanford Dogs | FID25.66 | 2 | |
| Image Generation | Stanford Cars | FID16.03 | 2 |