Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering

About

With the rapid progress of large audio-language models (LALMs), audio question answering (AQA) has emerged as a challenging task requiring both fine-grained audio understanding and complex reasoning. While current methods mainly rely on constructing new datasets via captioning or reasoning traces, existing high-quality AQA data remains underutilized. To address this, we propose Omni-CLST, an error-aware Curriculum Learning framework with guided Selective Chain-of-Thought. The framework efficiently leverages existing high-quality dataset through two key strategies: an error-aware curriculum that organizes samples by difficulty, and a guided thought dropout mechanism that focuses reasoning on challenging cases. Experiments show that Omni-CLST achieves 73.80% on MMAU-mini and a new state of the art of 64.30% on MMAR, demonstrating robust generalization in multimodal audio-language understanding.

Jinghua Zhao, Hang Su, Lichun Fan, Zhenbo Luo, Hui Wang, Haoqin Sun, Yong Qin• 2025

Related benchmarks

TaskDatasetResultRank
Audio ReasoningMMAR (test)
Sound Score58.2
17
multi-task audio reasoningMMAU Mini
Sound Score0.685
7
Showing 2 of 2 rows

Other info

Follow for update