Temporal Generative Adversarial Nets with Singular Value Clipping
About
In this paper, we propose a generative model, Temporal Generative Adversarial Nets (TGAN), which can learn a semantic representation of unlabeled videos, and is capable of generating videos. Unlike existing Generative Adversarial Nets (GAN)-based methods that generate videos with a single generator consisting of 3D deconvolutional layers, our model exploits two different types of generators: a temporal generator and an image generator. The temporal generator takes a single latent variable as input and outputs a set of latent variables, each of which corresponds to an image frame in a video. The image generator transforms a set of such latent variables into a video. To deal with instability in training of GAN with such advanced networks, we adopt a recently proposed model, Wasserstein GAN, and propose a novel method to train it stably in an end-to-end manner. The experimental results demonstrate the effectiveness of our methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | UCF-101 (test) | Inception Score15.83 | 105 | |
| Video Generation | UCF101 | FVD1.32e+3 | 54 | |
| Class-conditioned Video Generation | UCF101 (test) | -- | 19 | |
| Video Generation | UCF101 128x128 16 frames | Inception Score11.85 | 17 | |
| Unconditional video synthesis | UCF-101 128x128 | Inception Score11.85 | 12 | |
| Video Generation | UCF-101 16-frame | IS11.85 | 12 | |
| Video Generation | Human Actions | Inception Score3.65 | 9 | |
| Text-to-Video Generation | MUG | Image Similarity (IS)4.63 | 7 | |
| Video Generation | MUG (test) | FID97.07 | 6 | |
| Video Generation | Weizmann | FID99.85 | 6 |