Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Real-time Speech Frequency Bandwidth Extension

About

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu, Dominik Roblek• 2020

Related benchmarks

TaskDatasetResultRank
Audio Super-ResolutionVCTK 8-16 kHz
LSD0.79
6
Audio Super-ResolutionVCTK 4-16 kHz
LSD0.99
6
Audio Super-ResolutionMusDB 11.025-44.1 kHz
LSD1.13
6
Audio Super-ResolutionVCTK 8-24 kHz
LSD0.91
5
Audio Super-ResolutionVCTK 12-48 kHz (test)
LSD0.86
4
Showing 5 of 5 rows

Other info

Follow for update