Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Better and Faster End-to-End Model for Streaming ASR

About

End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this issue, we look at encouraging the E2E model to emit words early, through an algorithm called FastEmit [3]. Naturally, improving on latency results in a quality degradation. To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR. Secondly, we also explore running a 2nd-pass beam search to improve quality. In order to ensure the 2nd-pass completes quickly, we explore non-causal Conformer layers that feed into the same 1st-pass RNN-T decoder, an algorithm called Cascaded Encoders [5]. Overall, we find that the Conformer RNN-T with Cascaded Encoders offers a better quality and latency tradeoff for streaming ASR.

Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu• 2020

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech (test-other)
WER2.6
966
Automatic Speech RecognitionLibriSpeech clean (test)
WER1.4
833
Speech RecognitionWSJ (92-eval)
WER1.3
131
Automatic Speech RecognitionSWITCHBOARD swbd
WER4.3
39
Automatic Speech RecognitionTED-LIUM (test)
WER5.2
19
Automatic Speech RecognitionAMI IHM
WER9
10
Speech RecognitionYouTube (test)
WER9.1
10
Automatic Speech RecognitionAMI SDM English (eval)
WER21.2
8
Automatic Speech RecognitionSwitchboard Fisher (CH)
WER0.068
6
Automatic Speech RecognitionCommon Voice+ (test)
WER (%)8.4
6
Showing 10 of 11 rows

Other info

Follow for update