Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio

About

In text-to-audio (TTA) research, the relevance between input text and output audio is an important evaluation aspect. Traditionally, it has been evaluated from both subjective and objective perspectives. However, subjective evaluation is costly in terms of money and time, and objective evaluation is unclear regarding the correlation to subjective evaluation scores. In this study, we construct RELATE, an open-sourced dataset that subjectively evaluates the relevance. Also, we benchmark a model for automatically predicting the subjective evaluation score from synthesized audio. Our model outperforms a conventional CLAPScore model, and that trend extends to many sound categories.

Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari• 2025

Related benchmarks

TaskDatasetResultRank
Audio Assessment CorrelationRELATE
LCC0.385
25
Audio-text semantic alignmentXACLE (test)
SRCC0.3345
4
Text-audio relevance predictionXACLE Challenge official 2026 (test)
SRCC0.3345
2
Showing 3 of 3 rows

Other info

Follow for update