Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

About

Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the transcribed text. These errors significantly degrade the performance of downstream tasks. In this work, we propose a BERT-style language model, referred to as PhonemeBERT, that learns a joint language model with phoneme sequence and ASR transcript to learn phonetic-aware representations that are robust to ASR errors. We show that PhonemeBERT can be used on downstream tasks using phoneme sequences as additional features, and also in low-resource setup where we only have ASR-transcripts for the downstream tasks with no phoneme information available. We evaluate our approach extensively by generating noisy data for three benchmark datasets - Stanford Sentiment Treebank, TREC and ATIS for sentiment, question and intent classification tasks respectively. The results of the proposed approach beats the state-of-the-art baselines comprehensively on each dataset.

Mukuntha Narayanan Sundararaman, Ayush Kumar, Jithendra Vepa• 2021

Related benchmarks

Task	Dataset	Result
Intent Detection	ATIS	ID Accuracy95.14	32
Intent Detection	SLURP	Accuracy84.16	16
Intent Detection	TREC6	Accuracy86.48	10

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord