CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

About

In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. It is inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate-and-Fire (CIF). Applied to the ASR task, CIF not only shows a concise calculation, but also supports online recognition and acoustic boundary positioning, thus suitable for various ASR scenarios. Several support strategies are also proposed to alleviate the unique problems of CIF-based model. With the joint action of these methods, the CIF-based model shows competitive performance. Notably, it achieves a word error rate (WER) of 2.86% on the test-clean of Librispeech and creates new state-of-the-art result on Mandarin telephone ASR benchmark.

Linhao Dong, Bo Xu• 2019

Related benchmarks

Task	Dataset	Result
Automatic Speech Recognition	AISHELL-1 (test)	CER4.8	177
Automatic Speech Recognition	AISHELL-1 (dev)	CER4.4	66
Automatic Speech Recognition	AISHELL-2 (test_ios)	CER5.8	35
Automatic Speech Recognition	FLEURS (test)	Average Error Rate12.89	30
Automatic Speech Recognition	MLC-SLM (dev)	WER/CER18.95	21
Automatic Speech Recognition	AISHELL-2 mic	CER6.3	12
Automatic Speech Recognition	AISHELL-2 android	CER6.2	6
Multilingual Automatic Speech Recognition	CommonVoice (test)	WER18.45	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord