EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones
About
In this paper, we present Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial network (GAN) that enhances audio measured with body-conduction microphones. This type of capture equipment suppresses ambient noise at the expense of speech bandwidth, thereby requiring signal enhancement techniques to recover the wideband speech signal. EBEN leverages a multiband decomposition of the raw captured speech to decrease the data time-domain dimensions, and give better control over the full-band signal. This multiband representation is fed to a U-Net-like model, which adopts a combination of feature and adversarial losses to recover an enhanced audio signal. We also benefit from this original representation in the proposed discriminator architecture. Our approach can achieve state-of-the-art results with a lightweight generator and real-time compatible operation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech Enhancement | VCTK Vibration sensor 12-bit, 4-16 kHz upsampling (test) | LSD (Log-Spectral Distance)1.15 | 18 | |
| Speech Enhancement | VCTK Accelerometer 12-bit, 4-16 kHz upsampling (test) | LSD1.21 | 18 | |
| Bandwidth Extension (4-22 kHz upsampling) | MagnaTagATune (test) | LSD1.17 | 15 | |
| Bandwidth Extension (BWE) | VCTK Desktop | LSD1.13 | 10 | |
| Bandwidth Extension (BWE) | VCTK Google Pixel7 | LSD1.16 | 10 | |
| Speech Enhancement | Air- and Bone-Conducted Synchronized Speech corpus (test) | SI-SDR0.8 | 9 | |
| Bandwidth Extension (4-22 kHz upsampling) | VCTK (test) | LSD1.19 | 7 | |
| Bandwidth Extension (8-22 kHz upsampling) | VCTK (test) | LSD1.06 | 7 | |
| Bandwidth Extension (8-22 kHz upsampling) | MagnaTagATune (test) | LSD1.08 | 6 |