Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

About

Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.

Prasenjit K Mudi, Dahlia Devapriya, Sheetal Kalyani• 2026

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionArabic Dataset
Deletions Count0.00e+0
51
Automatic Speech RecognitionSPRING-INX Tamil
D Component Count1
24
Automatic Speech RecognitionRussian propn
Substitution Errors (D)361
2
Automatic Speech RecognitionRussian adp
Substitution Count (D)206
2
Automatic Speech RecognitionRussian cconj
Substitution Error Count (D)142
2
Automatic Speech RecognitionRussian x
Substitution Count (D)5
2
Automatic Speech RecognitionRussian part subset
Substitution Count (D)16
2
Automatic Speech RecognitionRussian pron
Substitution Error (D)24
2
Automatic Speech RecognitionRussian verb
Substitution Count4
2
Automatic Speech RecognitionRussian adj
Substitution Error (D)0.00e+0
2
Showing 10 of 15 rows

Other info

Follow for update