Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning

About

Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of 56.9% on unseen datasets. Our findings show how audio-language integration and reasoning advances medical diagnostics, enabling efficient AI systems for clinical decision support.

Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal Question AnsweringCaReSound
Yes/No Accuracy93.12
13
String-level response similarityRA-QA Global, Discriminative tasks
BERTScore0.89
8
String-level response similarityRA-QA Multiple-choice, Discriminative tasks
BERTScore0.85
4
Discriminative tasksRA-QA
Accuracy67
4
String-level response similarityRA-QA Single-Verify, Discriminative tasks
BERTScore91
4
Regression tasksRA-QA
MAE2.61
3
Showing 6 of 6 rows

Other info

Follow for update