Physics-Guided Variational Model for Unsupervised Sound Source Tracking

About

Sound source tracking is commonly performed using classical array-processing algorithms, while machine-learning approaches typically rely on precise source position labels that are expensive or impractical to obtain. This paper introduces a physics-guided variational model capable of fully unsupervised single-source sound source tracking. The method combines a variational encoder with a physics-based decoder that injects geometric constraints into the latent space through analytically derived pairwise time-delay likelihoods. Without requiring ground-truth labels, the model learns to estimate source directions directly from microphone array signals. Experiments on real-world data demonstrate that the proposed approach outperforms traditional baselines and achieves accuracy and computational complexity comparable to state-of-the-art supervised models. We further show that the method generalizes well to mismatched array geometries and exhibits strong robustness to corrupted microphone position metadata. Finally, we outline a natural extension of the approach to multi-source tracking and present the theoretical modifications required to support it.

Luan Vin\'icius Fiorio, Ivana Nikoloska, Bruno Defraene, Alex Young, Johan David, Ronald M. Aarts• 2026

Related benchmarks

Task	Dataset	Result
Direction of Arrival Estimation	Simulated data (Experiment 3)	RMSAE6.3	4
Direction of Arrival Estimation	LOCATA Experiment 3	RMS Angular Error8.2	4
Direction of Arrival Estimation	Simulated data Experiment 1	RMSAE5.5	4
Direction of Arrival Estimation	LOCATA Experiment 1	RMSAE8.8	4
DOA estimation	Simulated data Experiment 2 - directional AWGN	RMS Angular Error3.8	4
DOA estimation	LOCATA Experiment 2 - directional AWGN	RMSAE8	4
Direction of Arrival Estimation	LOCATA (test)	Params (M)0.89	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord