Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering

About

With the rapid progress of large audio-language models (LALMs), audio question answering (AQA) has emerged as a challenging task requiring both fine-grained audio understanding and complex reasoning. While current methods mainly rely on constructing new datasets via captioning or reasoning traces, existing high-quality AQA data remains underutilized. To address this, we propose Omni-CLST, an error-aware Curriculum Learning framework with guided Selective Chain-of-Thought. The framework efficiently leverages existing high-quality dataset through two key strategies: an error-aware curriculum that organizes samples by difficulty, and a guided thought dropout mechanism that focuses reasoning on challenging cases. Experiments show that Omni-CLST achieves 73.80% on MMAU-mini and a new state of the art of 64.30% on MMAR, demonstrating robust generalization in multimodal audio-language understanding.

Jinghua Zhao, Hang Su, Lichun Fan, Zhenbo Luo, Hui Wang, Haoqin Sun, Yong Qin• 2025

Related benchmarks

Task	Dataset	Result	Rank
Audio Reasoning	MMAR (test)	Average Score63		57
multi-task audio reasoning	MMAU Mini	Sound Score0.685		7

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord