Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech

About

Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Shaping network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-inspired architecture features a time domain encoder and a filtered noise shaping decoder that models the RIR as a summation of decaying filtered noise signals, along with direct sound and early reflection components. Previous methods for acoustic matching utilize either large models to transform audio to match the target room or predict parameters for algorithmic reverberators. Instead, blind estimation of the RIR enables efficient and realistic transformation with a single convolution. An evaluation demonstrates our model not only synthesizes RIRs that match parameters of the target room, such as the $T_{60}$ and DRR, but also more accurately reproduces perceptual characteristics of the target room, as shown in a listening test when compared to deep learning baselines.

Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia• 2021

Related benchmarks

Task	Dataset	Result
Room Impulse Response Estimation	SoundSpaces-Speech	RT60 Error (ms)87.7	18
Blind Room Impulse Response (RIR) Estimation	BUTReverbDB and OpenAIR Out-of-domain	T60 PAE (%)14.2	7
Direct-to-Reverberant Ratio Estimation	SimACE (test)	MAE2.153	5
Reverberation Time Estimation	SimACE (test)	MAE0.113	5
Clarity Estimation	SimACE (test)	MAE6.489	5
Early Reflection Estimation	SimACE (test)	RMSE0.067	5
Blind RIR reconstruction	LibriSpeech and merged RIR datasets	RT60 (s)0.167	4
Blind C50 Estimation	RIR-based dataset identity split (test)	MAEC50 (dB)2.9	4
Blind T60 Estimation	RIR-based dataset RIR identity (test)	MAPET60 (%)29.08	4
RIR reconstruction	RIR-based dataset RIR identity (test)	MAErec (dB)9.59	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord