Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN

About

Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution. We achieve this by designing the generator model as a stack of small sub-generators and training the model in a specific way. We train each sub-generator with its own specific discriminator. At the time of the training, we introduce between each pair of consecutive sub-generators an auxiliary subsampling layer that reduces the frame-rate by a certain ratio. This procedure can allow each sub-generator to learn the distribution of the video at different levels of resolution. We also need only a few GPUs to train a highly complex generator that far outperforms the predecessor in terms of inception scores.

Masaki Saito, Shunta Saito, Masanori Koyama, Sosuke Kobayashi• 2018

Related benchmarks

Task	Dataset	Result
Video Generation	UCF-101 (test)	Inception Score28.87	105
Video Generation	UCF101	FVD1.21e+5	68
Class-conditioned Video Generation	UCF101 (test)	Fréchet Video Distance1.21e+3	19
Video Prediction	UCF-101 (test)	FVD1.43e+3	19
Video Generation	UCF101 128x128 16 frames	Inception Score28.87	17
Video Generation	FaceForensics	FVD58.03	15
Text-to-Video Generation	UCF-101 (fine-tuning)	IS26.6	13
Unconditional video synthesis	UCF-101 128x128	Inception Score26.6	12
Video Generation	UCF-101 64 x 64 (test)	FVD1.21e+3	12
Video Generation	UCF-101 (train)	Inception Score28.87	11

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord