AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

About

Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, resulting in inconsistency in data and architecture. Thus, we propose AnoPatch which utilizes a ViT backbone pre-trained on AudioSet and fine-tunes it on machine audio. It is believed that machine audio is more related to audio datasets than speech datasets, and modeling it from patch level suits the sparsity of machine audio. As a result, AnoPatch showcases state-of-the-art (SOTA) performances on the DCASE 2020 ASD dataset and the DCASE 2023 ASD dataset. We also compare multiple pre-trained models and empirically demonstrate that better consistency yields considerable improvement.

Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan• 2024

Related benchmarks

Task	Dataset	Result
Anomalous Sound Detection	DCASE 2020 (dev)	Official Performance Metric90.9	46
Anomalous Sound Detection	DCASE 2023 (eval)	Official Performance Score74.2	17
Anomalous Sound Detection	DCASE 2023 (dev)	Performance Metric64.2	17
Anomalous Sound Detection	DCASE 2020	Dataset-wise Harmonic Mean92.6	16
Anomalous Sound Detection	DCASE 2023	Dataset-wise Harmonic Mean68.8	16
Anomalous Sound Detection	DCASE 2024	Dataset-wise Harmonic Mean65	16
Anomalous Sound Detection	DCASE 2024 (eval)	Official Performance Metric66	16
Anomalous Sound Detection	DCASE 2020 (eval)	Official Performance Metric94.3	15
Anomalous Sound Detection	DCASE 2024 (dev)	Performance Score64.1	14

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord