Dolphin-CN-Dialect: Where Chinese Dialects Matter
About
We present Dolphin-CN-Dialect, a streaming-capable ASR model with a focus on Chinese and dialect-rich scenarios. Compared to the previous version, Dolphin-CN-Dialect introduces substantial improvements in data processing, tokenization, training stability, and data sampling strategies. To address the challenges of highly imbalanced dialect data, we propose a temperature-based sampling strategy that effectively balances standard Mandarin and low-resource dialects, leading to significant gains in dialect recognition performance. In addition, we redesign the tokenizer to better align with linguistic characteristics, adopting character-level modeling for Chinese and subword modeling for English, while introducing extensible dialect tokens. Experimental results show that Dolphin-CN-Dialect achieves improvement in dialect recognition accuracy and CER reduction compared to Dolphin. Furthermore, Dolphin-CN-Dialect reaches competitive performance with recent SOTA open-source ASR models, while maintaining a significantly smaller model size. Dolphin-CN-Dialect supports both streaming and non-streaming inference, enabling a practical balance between latency and accuracy. It also provides flexible customization through hotword support and efficient deployment optimized for specialized hardware. These improvements make Dolphin-CN-Dialect a strong and practical solution for real-world multi-dialect ASR applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automatic Speech Recognition | KeSpeech | CER5.04 | 35 | |
| Speech Recognition | Haitan internal tw (test) | CER6.68 | 10 | |
| Speech Recognition | Haitan internal sichuan (test) | CER9.63 | 10 | |
| Speech Recognition | Haitan internal wu (test) | CER9.49 | 10 | |
| Speech Recognition | Haitan internal minnan (test) | CER20.74 | 10 | |
| Speech Recognition | Haitan internal liaoning (test) | CER3.25 | 10 | |
| Speech Recognition | Haitan internal fujian (test) | CER3.62 | 10 | |
| Speech Recognition | Haitan internal hunan (test) | CER11.89 | 10 | |
| Speech Recognition | Haitan internal guangdong (test) | CER6.03 | 10 | |
| Speech Recognition | Haitan internal (wenzhou) (test) | CER (%)2.25 | 10 |