SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

About

We introduce SeaLLMs-Audio, the first large audio-language model (LALM) tailored for multiple Southeast Asian (SEA) languages-Indonesian (id), Thai (th), and Vietnamese (vi)-alongside English (en) and Chinese (zh). Trained on a large-scale audio corpus, SeaLLMs-Audio exhibits strong performance across diverse audio-centric tasks, spanning fine-grained audio understanding and voice-based interaction. Its key features include: 1) Multilingual: the model primarily supports 5 languages, namely Indonesian, Thai, Vietnamese, English, and Chinese; 2) Multimodal: the model accepts flexible input modalities, including audio only, text only, as well as audio with text; 3) Multi-task: the model supports a wide range of tasks, including audio analysis tasks such as Audio Captioning, Automatic Speech Recognition, Speech-to-Text Translation, Speech Emotion Recognition, Speech Question Answering, and Speech Summarization. It also enables voice-based dialogue, including answering factual, mathematical, and general knowledge queries. As a significant step towards advancing audio LLMs in Southeast Asia, we expect SeaLLMs-Audio to benefit both the regional research community and industry. To automate LALM evaluation for Southeast Asia, we introduce SeaBench-Audio, a benchmark spanning multiple tasks. Experiments show that SeaLLMs-Audio achieves competitive performance compared with other LALMs on SEA languages.

Chaoqun Liu, Mahani Aljunied, Guizhen Chen, Hou Pong Chan, Weiwen Xu, Yu Rong, Wenxuan Zhang• 2025

Related benchmarks

Task	Dataset	Result
Automatic Speech Recognition	AISHELL-1	CER9.65	55
Automatic Speech Recognition	LibriSpeech	WER0.9474	35
Speech Deepfake Detection	SeaCF (seen setting)	Accuracy88.74	32
Speech Deepfake Detection	CodecFake (CF) seen setting	Accuracy90.75	32
Automatic Speech Recognition	fleurs Tamil	WER105.3	17
Audio CAPTCHA Bypass	Geetest Audio CAPTCHA	Bypass Rate100	10
Audio CAPTCHA Bypass	ILLUSIONAUDIO	Bypass Rate0.00e+0	10
Audio CAPTCHA Bypass	Google Audio CAPTCHA	Bypass Rate80	10
Audio CAPTCHA Bypass	MTCaptcha Audio CAPTCHA	Bypass Rate16.66	10
Audio CAPTCHA Bypass	Math Audio CAPTCHA	Bypass Rate10	10

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord