Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

About

We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music. AF3 introduces: (i) AF-Whisper, a unified audio encoder trained using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the model to do chain-of-thought-type reasoning before answering; (iii) multi-turn, multi-audio chat; (iv) long audio understanding and reasoning (including speech) up to 10 minutes; and (v) voice-to-voice interaction. To enable these capabilities, we propose several large-scale training datasets curated using novel strategies, including AudioSkills-XL, LongAudio-XL, AF-Think, and AF-Chat, and train AF3 with a novel five-stage curriculum-based training strategy. Trained on only open-source audio data, AF3 achieves new SOTA results on over 20+ (long) audio understanding and reasoning benchmarks, surpassing both open-weight and closed-source models trained on much larger datasets.

Arushi Goel, Sreyan Ghosh, Jaehyeon Kim, Sonal Kumar, Zhifeng Kong, Sang-gil Lee, Chao-Han Huck Yang, Ramani Duraiswami, Dinesh Manocha, Rafael Valle, Bryan Catanzaro• 2025

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech clean (test)
WER1.57
1207
Automatic Speech RecognitionLibriSpeech (test-other)
WER3.13
1206
Audio CaptioningAudioCaps (test)
CIDEr0.79
157
Automatic Speech RecognitionLibriSpeech Other
WER3.13
123
Speaker VerificationVoxCeleb1 (Vox1-O)--
105
Multimodal UnderstandingMMMU
MMMU Score72.42
102
Audio-Visual Question AnsweringAVQA
Accuracy64.3
85
Audio CaptioningAudioCaps
CIDEr70
66
Music Genre ClassificationGTZAN
Accuracy83.2
62
Audio-visual understandingDaily-Omni
Accuracy52.5
58
Showing 10 of 171 rows
...

Other info

Follow for update