Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

About

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10\% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features.

Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu• 2022

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech (test-other)
WER6.53
966
Automatic Speech RecognitionLibriSpeech clean (test)
WER2.66
833
Automatic Speech RecognitionLibrispeech (test-clean)
WER2.66
84
Automatic Speech RecognitionAISHELL-1 (test)
CER4.61
71
Automatic Speech RecognitionWenetSpeech Meeting (test)
CER15.59
45
Automatic Speech RecognitionGigaSpeech (test)
WER10.6
40
Automatic Speech RecognitionWenetSpeech Net (test)
CER9.7
25
Automatic Speech RecognitionGigaSpeech (dev)
WER0.107
22
Automatic Speech RecognitionAISHELL (test)
CER4.4
20
Automatic Speech RecognitionAISHELL-2 (test_ios)
CER5.35
20
Showing 10 of 12 rows

Other info

Follow for update