Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems

About

Modern neural networks have greatly improved performance across speech recognition benchmarks. However, gains are often driven by frequent words with limited semantic weight, which can obscure meaningful differences in word error rate, the primary evaluation metric. Errors in rare terms, named entities, and domain-specific vocabulary are more consequential, but remain hidden by aggregate metrics. This highlights the need for finer-grained error analysis, which depends on accurate alignment between reference and model transcripts. However, conventional alignment methods are not designed for such precision. We propose a novel alignment algorithm that couples dynamic programming with beam search scoring. Compared to traditional text alignment methods, our approach provides more accurate alignment of individual errors, enabling reliable error analysis. The algorithm is made available via PyPI.

Lasse Borgholt, Jakob Havtorn, Christian Igel, Lars Maal{\o}e, Zheng-Hua Tan• 2025

Related benchmarks

TaskDatasetResultRank
Transcript AlignmentTED-LIUM v3 (test)
Character GLE90.3
16
Transcript AlignmentPriMock57 (PM57) 1 (test)
Character GLE84.6
16
Transcript AlignmentCommon Voice English 8 (test)
Character GLE77
16
Speech AlignmentCommon Voice Spanish
Character GLE (%)77.8
3
Speech AlignmentCommon Voice English
Delta Character GLE (%)-4.3
3
Speech AlignmentCommon Voice Portuguese
Character GLE78.3
3
Speech AlignmentCommon Voice Turkish
Character GLE77.7
3
Speech AlignmentCommon Voice German
Character GLE (%)76.9
3
Speech AlignmentCommon Voice Polish
Character GLE76.7
3
Speech AlignmentCommon Voice Indonesian
Character GLE76.5
3
Showing 10 of 12 rows

Other info

Follow for update