Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OSUM-Pangu: An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs

About

Recent advancements in Speech Large Language Models have significantly enhanced multi-dimensional speech understanding. However, the majority of high-performance frameworks are predominantly optimized for GPU centric ecosystems and proprietary backbones, creating a significant gap for deployment on non-CUDA computing infrastructures. In this paper, we present OSUM-Pangu, a fully open-source speech understanding foundation model developed on a completely non-CUDA software and hardware stack. By integrating an audio encoder with the openPangu-7B LLM backbone, we successfully implement the entire training and inference pipeline on the Ascend NPU platform. To facilitate efficient task alignment under non-CUDA resource constraints, we adopt a practical training process that sequentially bridges speech perception and user intent recognition. Experimental results demonstrate that OSUM-Pangu achieves task accuracy comparable to mainstream GPU-based models while maintaining robust natural language interaction capabilities. Our work provides a reproducible, non-CUDA baseline for the open-source speech community, promoting the independent evolution of multimodal intelligence.

Yujie Liao, Xuelong Geng, Hongfei Xue, Shuiyuan Wang, Lei Xie• 2026

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech Other
WER8.36
96
Automatic Speech RecognitionLibriSpeech Clean
WER3.51
80
Emotion RecognitionMELD (test)--
28
Automatic Speech RecognitionWenetSpeech (meeting)
WER10.49
23
Speech-to-Text Question-AnsweringTriviaQA
Accuracy28.9
23
Speech-to-Text Question-AnsweringWebQ
Accuracy29.5
23
Speech-to-Text Question-AnsweringLlamaQ
Accuracy44.6
23
Automatic Speech RecognitionAISHELL-2 mic
CER3.01
12
Automatic Speech RecognitionAISHELL-2 i (iOS)
WER2.98
6
Age ClassificationCommon Voice (test)
Accuracy83.31
5
Showing 10 of 16 rows

Other info

Follow for update