Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts
About
Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automatic Speech Recognition | Arabic Dataset | Deletions Count0.00e+0 | 51 | |
| Automatic Speech Recognition | SPRING-INX Tamil | D Component Count1 | 24 | |
| Automatic Speech Recognition | Russian propn | Substitution Errors (D)361 | 2 | |
| Automatic Speech Recognition | Russian adp | Substitution Count (D)206 | 2 | |
| Automatic Speech Recognition | Russian cconj | Substitution Error Count (D)142 | 2 | |
| Automatic Speech Recognition | Russian x | Substitution Count (D)5 | 2 | |
| Automatic Speech Recognition | Russian part subset | Substitution Count (D)16 | 2 | |
| Automatic Speech Recognition | Russian pron | Substitution Error (D)24 | 2 | |
| Automatic Speech Recognition | Russian verb | Substitution Count4 | 2 | |
| Automatic Speech Recognition | Russian adj | Substitution Error (D)0.00e+0 | 2 |