Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

About

We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD). MarbleNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. When compared to a state-of-the-art VAD model, MarbleNet is able to achieve similar performance with roughly 1/10-th the parameter cost. We further conduct extensive ablation studies on different training methods and choices of parameters in order to study the robustness of MarbleNet in real-world VAD tasks.

Fei Jia, Somshubra Majumdar, Boris Ginsburg• 2020

Related benchmarks

TaskDatasetResultRank
Voice Activity DetectionAVA-Speech (test)
AUC-ROC85.8
7
Voice Activity DetectionHAVIC (test)
AUC-ROC0.804
5
Showing 2 of 2 rows

Other info

Follow for update