Physics-Guided Variational Model for Unsupervised Sound Source Tracking
About
Sound source tracking is commonly performed using classical array-processing algorithms, while machine-learning approaches typically rely on precise source position labels that are expensive or impractical to obtain. This paper introduces a physics-guided variational model capable of fully unsupervised single-source sound source tracking. The method combines a variational encoder with a physics-based decoder that injects geometric constraints into the latent space through analytically derived pairwise time-delay likelihoods. Without requiring ground-truth labels, the model learns to estimate source directions directly from microphone array signals. Experiments on real-world data demonstrate that the proposed approach outperforms traditional baselines and achieves accuracy and computational complexity comparable to state-of-the-art supervised models. We further show that the method generalizes well to mismatched array geometries and exhibits strong robustness to corrupted microphone position metadata. Finally, we outline a natural extension of the approach to multi-source tracking and present the theoretical modifications required to support it.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Direction of Arrival Estimation | Simulated data (Experiment 3) | RMSAE6.3 | 4 | |
| Direction of Arrival Estimation | LOCATA Experiment 3 | RMS Angular Error8.2 | 4 | |
| Direction of Arrival Estimation | Simulated data Experiment 1 | RMSAE5.5 | 4 | |
| Direction of Arrival Estimation | LOCATA Experiment 1 | RMSAE8.8 | 4 | |
| DOA estimation | Simulated data Experiment 2 - directional AWGN | RMS Angular Error3.8 | 4 | |
| DOA estimation | LOCATA Experiment 2 - directional AWGN | RMSAE8 | 4 | |
| Direction of Arrival Estimation | LOCATA (test) | Params (M)0.89 | 3 |