Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Deep Residual Learning for Small-Footprint Keyword Spotting

About

We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google's previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models that also outperform previous small-footprint variants. To our knowledge, we are the first to examine these approaches for keyword spotting, and our results establish an open-source state-of-the-art reference to support the development of future speech-based interfaces.

Raphael Tang, Jimmy Lin• 2017

Related benchmarks

TaskDatasetResultRank
Keyword SpottingGoogle Speech Commands v1 (test)
Accuracy95.8
68
Keyword SpottingGoogle Speech Commands (test)
Accuracy95.8
61
Keyword SpottingGoogle Speech Commands V2-35
Accuracy96.4
42
Keyword SpottingGoogle Speech Commands
Accuracy95.8
23
Keyword SpottingGoogle Speech Commands V2-12 2018
Accuracy98
16
Keyword SpottingGoogle Speech Commands 12 classes v1 (test)
Accuracy95.8
13
Keyword SpottingFar-field Command (test)
Accuracy (Clean)89.45
8
Keyword SpottingGoogle Speech Commands 12 V2 (Official)
Accuracy96.48
8
Showing 8 of 8 rows

Other info

Follow for update