Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

About

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.

Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou• 2020

Related benchmarks

TaskDatasetResultRank
Multi-sound source localizationMUSIC-Duet (test)
CIoU@0.338.8
23
Multi-sound source localizationVGGSound-Duet (test)
CIoU@0.336.9
23
Single-source sound localizationVGGSound single-source (test)
IoU@0.546.8
23
Sound LocalizationMUSIC-Solo 1.0 (test)
IoU@0.562.7
22
Visual Sound Source LocalizationVGG-SS (test)
LocAcc29.91
19
Visual Sound Source LocalizationFlickr SoundNet (test)
LocAcc74
18
Multi-source sound localizationVGGSound Instruments (test)
CIoU@0.185.9
13
Single-source sound localizationVGGSound Instruments (test)
IoU@0.361.6
13
Visual Sound Source LocalizationFlickr-SoundNet extended (test)
LocAcc72.91
11
Visual Sound Source LocalizationVGG-SS extended (test)
Localization Accuracy26.87
11
Showing 10 of 22 rows

Other info

Follow for update