AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering
About
Although text-to-audio generation has made remarkable progress in realism and diversity, the development of evaluation metrics has not kept pace. Widely-adopted approaches, typically based on embedding similarity like CLAPScore, effectively measure general relevance but remain limited in fine-grained semantic alignment and compositional reasoning. To address this, we introduce AQAScore, a backbone-agnostic evaluation framework that leverages the reasoning capabilities of audio-aware large language models (ALLMs). AQAScore reformulates assessment as a probabilistic semantic verification task; rather than relying on open-ended text generation, it estimates alignment by computing the exact log-probability of a "Yes" answer to targeted semantic queries. We evaluate AQAScore across multiple benchmarks, including human-rated relevance, pairwise comparison, and compositional reasoning tasks. Experimental results show that AQAScore consistently achieves higher correlation with human judgments than similarity-based metrics and generative prompting baselines, showing its effectiveness in capturing subtle semantic inconsistencies and scaling with the capability of underlying ALLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Audio Assessment Correlation | RELATE | LCC0.544 | 25 | |
| Audio Assessment Correlation | PAM | LCC0.582 | 23 | |
| Hallucination Detection | BRACE Hallucination 1.0 (test) | AudioCaps Score98.8 | 20 | |
| Text-Audio Alignment | RELATE-Pair | Pair Accuracy77.6 | 20 | |
| Text-Audio Alignment | Baton two-sound-event | AUC0.69 | 20 | |
| Text-Audio Alignment | Baton three-sound-event | AUC0.65 | 20 | |
| Text-Audio Alignment | Baton-Pair two-sound-event | Pair Accuracy71 | 20 | |
| Text-Audio Alignment | Baton-Pair three-sound-event | Pair Accuracy69.1 | 20 | |
| Audio-Text Alignment Evaluation | BRACE Clotho-Main 1.0 (test) | HH68.3 | 20 | |
| Audio-Text Alignment Evaluation | BRACE AudioCaps-Main 1.0 (test) | HH51.1 | 20 |