Fun-ASR Technical Report
About
In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination, which can significantly degrade user experience in real-world ASR applications. In this paper, we present Fun-ASR, a large-scale, LLM-based ASR system that synergistically combines massive data, large model capacity, LLM integration, and reinforcement learning to achieve state-of-the-art performance across diverse and complex speech recognition scenarios. Moreover, Fun-ASR is specifically optimized for practical deployment, with enhancements in streaming capability, noise robustness, code-switching, hotword customization, and satisfying other real-world application requirements. Experimental results show that while most LLM-based ASR systems achieve strong performance on open-source benchmarks, they often underperform on real industry evaluation sets. Thanks to production-oriented optimizations, Fun-ASR achieves state-of-the-art performance on real application datasets, demonstrating its effectiveness and robustness in practical settings. The code and models are accessible at https://github.com/FunAudioLLM/Fun-ASR .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automatic Speech Recognition | LibriSpeech clean (test) | WER1.51 | 1156 | |
| Automatic Speech Recognition | LibriSpeech (test-other) | WER4.33 | 1151 | |
| Automatic Speech Recognition | LibriSpeech (dev-other) | WER4.06 | 462 | |
| Automatic Speech Recognition | LibriSpeech (dev-clean) | WER (%)1.63 | 340 | |
| Automatic Speech Recognition | AISHELL-1 (test) | CER1.64 | 97 | |
| Automatic Speech Recognition | LibriSpeech Other | WER4.03 | 96 | |
| Automatic Speech Recognition | LibriSpeech Clean | WER1.68 | 80 | |
| Automatic Speech Recognition | WenetSpeech Meeting (test) | CER6.6 | 78 | |
| Automatic Speech Recognition | WenetSpeech Net (test) | CER6.01 | 57 | |
| Automatic Speech Recognition | AISHELL-1 | CER1.22 | 50 |