Index-ASR Technical Report

About

Automatic speech recognition (ASR) has witnessed remarkable progress in recent years, largely driven by the emergence of LLM-based ASR paradigm. Despite their strong performance on a variety of open-source benchmarks, existing LLM-based ASR systems still suffer from two critical limitations. First, they are prone to hallucination errors, often generating excessively long and repetitive outputs that are not well grounded in the acoustic input. Second, they provide limited support for flexible and fine-grained contextual customization. To address these challenges, we propose Index-ASR, a large-scale LLM-based ASR system designed to simultaneously enhance robustness and support customizable hotword recognition. The core idea of Index-ASR lies in the integration of LLM and large-scale training data enriched with background noise and contextual information. Experimental results show that our Index-ASR achieves strong performance on both open-source benchmarks and in-house test sets, highlighting its robustness and practicality for real-world ASR applications.

Zheshu Song, Lu Wang, Wei Deng, Zhuo Yang, Yong Wu, Bin Xia• 2025

Related benchmarks

Task	Dataset	Result
Automatic Speech Recognition	LibriSpeech (test-other)	WER3.54	1447
Automatic Speech Recognition	Librispeech (test-clean)	WER1.92	170
Automatic Speech Recognition	GigaSpeech (test)	WER10.29	55
Automatic Speech Recognition	WenetSpeech (meeting)	WER6.17	23
Automatic Speech Recognition	WenetSpeech net	WER5.22	20
Automatic Speech Recognition	In-house ZH domain A (test)	WER9.9	5
Automatic Speech Recognition	In-house ZH domain B (test)	WER5.38	5
Automatic Speech Recognition	In-house ZH domain C (test)	WER4.31	5
Automatic Speech Recognition	In-house ZH domain E (test)	WER6.57	5
Automatic Speech Recognition	In-house ZH domain F (test)	WER6.25	5

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord