Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

About

We introduce SeaLLMs-Audio, the first large audio-language model (LALM) tailored for multiple Southeast Asian (SEA) languages-Indonesian (id), Thai (th), and Vietnamese (vi)-alongside English (en) and Chinese (zh). Trained on a large-scale audio corpus, SeaLLMs-Audio exhibits strong performance across diverse audio-centric tasks, spanning fine-grained audio understanding and voice-based interaction. Its key features include: 1) Multilingual: the model primarily supports 5 languages, namely Indonesian, Thai, Vietnamese, English, and Chinese; 2) Multimodal: the model accepts flexible input modalities, including audio only, text only, as well as audio with text; 3) Multi-task: the model supports a wide range of tasks, including audio analysis tasks such as Audio Captioning, Automatic Speech Recognition, Speech-to-Text Translation, Speech Emotion Recognition, Speech Question Answering, and Speech Summarization. It also enables voice-based dialogue, including answering factual, mathematical, and general knowledge queries. As a significant step towards advancing audio LLMs in Southeast Asia, we expect SeaLLMs-Audio to benefit both the regional research community and industry. To automate LALM evaluation for Southeast Asia, we introduce SeaBench-Audio, a benchmark spanning multiple tasks. Experiments show that SeaLLMs-Audio achieves competitive performance compared with other LALMs on SEA languages.

Chaoqun Liu, Mahani Aljunied, Guizhen Chen, Hou Pong Chan, Weiwen Xu, Yu Rong, Wenxuan Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionAISHELL-1
CER9.65
50
Automatic Speech RecognitionLibriSpeech
WER0.9474
24
Automatic Speech Recognitionfleurs Tamil
WER105.3
17
Audio CAPTCHA BypassGeetest Audio CAPTCHA
Bypass Rate100
10
Audio CAPTCHA BypassILLUSIONAUDIO
Bypass Rate0.00e+0
10
Audio CAPTCHA BypassGoogle Audio CAPTCHA
Bypass Rate80
10
Audio CAPTCHA BypassMTCaptcha Audio CAPTCHA
Bypass Rate16.66
10
Audio CAPTCHA BypassMath Audio CAPTCHA
Bypass Rate10
10
Close-ended Spoken Question AnsweringAudio-MLQA
Score (EN)3.74
10
Open-ended instruction followingAlpacaEval Audio
EN Score3.93
10
Showing 10 of 23 rows

Other info

Follow for update