Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering

About

Although text-to-audio generation has made remarkable progress in realism and diversity, the development of evaluation metrics has not kept pace. Widely-adopted approaches, typically based on embedding similarity like CLAPScore, effectively measure general relevance but remain limited in fine-grained semantic alignment and compositional reasoning. To address this, we introduce AQAScore, a backbone-agnostic evaluation framework that leverages the reasoning capabilities of audio-aware large language models (ALLMs). AQAScore reformulates assessment as a probabilistic semantic verification task; rather than relying on open-ended text generation, it estimates alignment by computing the exact log-probability of a "Yes" answer to targeted semantic queries. We evaluate AQAScore across multiple benchmarks, including human-rated relevance, pairwise comparison, and compositional reasoning tasks. Experimental results show that AQAScore consistently achieves higher correlation with human judgments than similarity-based metrics and generative prompting baselines, showing its effectiveness in capturing subtle semantic inconsistencies and scaling with the capability of underlying ALLMs.

Chun-Yi Kuan, Kai-Wei Chang, Hung-yi Lee• 2026

Related benchmarks

TaskDatasetResultRank
Audio Assessment CorrelationRELATE
LCC0.544
25
Audio Assessment CorrelationPAM
LCC0.582
23
Hallucination DetectionBRACE Hallucination 1.0 (test)
AudioCaps Score98.8
20
Text-Audio AlignmentRELATE-Pair
Pair Accuracy77.6
20
Text-Audio AlignmentBaton two-sound-event
AUC0.69
20
Text-Audio AlignmentBaton three-sound-event
AUC0.65
20
Text-Audio AlignmentBaton-Pair two-sound-event
Pair Accuracy71
20
Text-Audio AlignmentBaton-Pair three-sound-event
Pair Accuracy69.1
20
Audio-Text Alignment EvaluationBRACE Clotho-Main 1.0 (test)
HH68.3
20
Audio-Text Alignment EvaluationBRACE AudioCaps-Main 1.0 (test)
HH51.1
20
Showing 10 of 12 rows

Other info

Follow for update