Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

About

Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.

Sangmin Bae, June-Woo Kim, Won-Yang Cho, Hyerim Baek, Soyoun Son, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim, Se-Young Yun• 2023

Related benchmarks

Task	Dataset	Result
Respiratory sound classification	ICBHI dataset official (60-40% split)	Score62.37	67
4-class respiratory sound classification	ICBHI 60-40% split official (test)	Specificity81.66	41
Respiratory sound classification	ICBHI 2017 (official)	Specificity81.66	32
2-class respiratory sound classification	ICBHI 60-40% split official (test)	Specificity81.66	23
Respiratory sound classification	AKGC417L (IND)	Overall Score80.06	17
Respiratory sound classification	ICBHI 2017 (test)	Specificity81.66	10
Respiratory sound classification	ICBHI In-Distribution	Specificity81.66	9
Respiratory sound classification	SNUBH in-house clinical	Specificity70.78	9
Respiratory sound classification	SPRSound Out-of-Distribution	Specificity69.62	9
4-class respiratory sound classification	ICBHI 2017 (official 60-40% split)	Specificity0.8166	8

Showing 10 of 21 rows

Other info

Code

Follow for update

@wizwand_team Discord