Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
About
Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Respiratory sound classification | ICBHI dataset official (60-40% split) | Specificity81.66 | 42 | |
| Respiratory sound classification | ICBHI 2017 (official) | Specificity81.66 | 32 | |
| 4-class respiratory sound classification | ICBHI 60-40% split official (test) | Specificity81.66 | 31 | |
| 2-class respiratory sound classification | ICBHI 60-40% split official (test) | Specificity81.66 | 16 | |
| Respiratory sound classification | ICBHI 2017 (test) | Specificity81.66 | 10 | |
| 4-class respiratory sound classification | ICBHI 2017 (official 60-40% split) | Specificity0.8166 | 8 | |
| 4-class Lung Sound Classification | ICBHI 2017 (test) | Specificity81.66 | 7 |