Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch

About

Audio processing methods based on deep neural networks are typically trained at a single sampling frequency (SF). To handle untrained SFs, signal resampling is commonly employed, but it can degrade performance, particularly when the input SF is lower than the trained SF. This paper investigates the causes of this degradation through two hypotheses: (i) the lack of high-frequency components introduced by up-sampling, and (ii) the greater importance of their presence than their precise representation. To examine these hypotheses, we compare conventional resampling with three alternatives: post-resampling noise addition, which adds Gaussian noise to the resampled signal; noisy-kernel resampling, which perturbs the kernel with Gaussian noise to enrich high-frequency components; and trainable-kernel resampling, which adapts the interpolation kernel through training. Experiments on music source separation show that noisy-kernel and trainable-kernel resampling alleviate the degradation observed with conventional resampling. We further demonstrate that noisy-kernel resampling is effective across diverse models, highlighting it as a simple yet practical option.

Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, Hiroshi Saruwatari• 2026

Related benchmarks

Task	Dataset	Result	Rank
Music Source Separation	MUSDB18 HQ	Vocals Score10.83		21

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord