Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adversarial Audio Synthesis

About

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales. Generative adversarial networks (GANs) have seen wide success at generating images that are both locally and globally coherent, but they have seen little application to audio generation. In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio. WaveGAN is capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Our experiments demonstrate that, without labels, WaveGAN learns to produce intelligible words when trained on a small-vocabulary speech dataset, and can also synthesize audio from other domains such as drums, bird vocalizations, and piano. We compare WaveGAN to a method which applies GANs designed for image generation on image-like audio feature representations, finding both approaches to be promising.

Chris Donahue, Julian McAuley, Miller Puckette• 2018

Related benchmarks

TaskDatasetResultRank
ECG Abnormality ClassificationG12EC (test)
Specificity96
63
Synthetic Time Series GenerationElectric Motor
Average ELBO1.54
48
Synthetic Time Series GenerationETT
Average ELBO1.5
48
Synthetic Time Series GenerationMetroPT3
Average ELBO0.35
48
Synthetic Time Series GenerationECG
Average ELBO1.33
48
Synthetic Time Series GenerationSine
Average ELBO Score0.6
48
Time-series generationEnergy
Discriminative Score0.363
45
Time-series generationStocks
Discriminative Score0.217
29
Time-series generationETT
Discriminative Score38.5
24
Synthetic Time Series GenerationMetroPT3
FID Score1.14
24
Showing 10 of 33 rows

Other info

Follow for update