Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A neural attention model for speech command recognition

About

This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters. Results are compared with previous convolutional implementations on 5 different tasks (20 commands recognition (V1 and V2), 12 commands recognition (V1), 35 word recognition (V1) and left-right (V1)). We show detailed performance results and demonstrate that the proposed attention mechanism not only improves performance but also allows inspecting what regions of the audio were taken into consideration by the network when outputting a given category.

Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener Da Silva Viana, Christoph Bernkopf• 2018

Related benchmarks

TaskDatasetResultRank
Keyword SpottingGoogle Speech Commands v1 (test)
Accuracy96.9
68
Keyword SpottingGoogle Speech Commands V2-35
Accuracy93.9
42
Keyword SpottingGoogle Speech Commands V2 (test)
Accuracy96.9
39
Keyword SpottingSpeech Commands KS2 v2
Accuracy94.3
23
Speech Command RecognitionGoogle Speech Command Dataset 20-cmd V2 (test)
Accuracy95.06
19
Keyword SpottingGoogle Speech Commands V2-12 2018
Accuracy96.9
16
Speech Command RecognitionGoogle Speech Command Dataset 20-cmd V1 (test)
Accuracy0.941
6
Spoken-term recognitionGoogle Commands noise setting (test)
Accuracy94.21
6
Speech Command RecognitionGoogle Speech Command Dataset 35-word V1 (test)
Accuracy94.3
5
Speech Command RecognitionGoogle Speech Command Dataset left/right V1 (test)
Accuracy0.992
5
Showing 10 of 13 rows

Other info

Follow for update