Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

E-Branchformer: Branchformer with Enhanced merging for speech recognition

About

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data.

Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe• 2022

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech (test-other)
WER3.65
966
Automatic Speech RecognitionLibriSpeech clean (test)
WER1.81
833
Automatic Speech RecognitionLibrispeech (test-clean)
WER2.14
84
Showing 3 of 3 rows

Other info

Code

Follow for update