Few-Shot Audio-Visual Learning of Environment Acoustics

About

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed in the space. Towards that goal, we introduce a transformer-based method that uses self-attention to build a rich acoustic context, then predicts RIRs of arbitrary query source-receiver locations through cross-attention. Additionally, we design a novel training objective that improves the match in the acoustic signature between the RIR predictions and the targets. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs, outperforming state-of-the-art methods and -- in a major departure from traditional methods -- generalizing to novel environments in a few-shot manner. Project: http://vision.cs.utexas.edu/projects/fs_rir.

Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman• 2022

Related benchmarks

Task	Dataset	Result
Novel-view Sound Synthesis	Soundspace-Ambient (Unseen Scenes)	STFT5.457	15
Novel-view Sound Synthesis	Soundspace-Ambient (Seen Scenes)	STFT5.937	15
Room Impulse Response (RIR) Prediction	Matterport3D (Seen environments)	STFT1.1	9
Room Impulse Response (RIR) Prediction	Matterport3D (Unseen environments)	STFT1.22	9
Binaural audio synthesis	N2S (test)	STFT1.765	9
Novel-view Sound Synthesis	N2S Benchmark real-world scene	STFT Error1.765	9
Depth Estimation	environments (unseen)	DPE1.45	7
Sound Source Localization	environments (unseen)	SLE64.6	7
Room Impulse Response Estimation	Acoustic Wonderland (AcoW) seen material profiles (Dus)	L1 Error16.73	7
Room Impulse Response Estimation	Acoustic Wonderland (AcoW) unseen material profiles (Duu)	L1 Error14.81	7

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord