| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Sound Source Localization | VGG-SS (test) | LocAcc39.8 | 19 | |
| Audio-visual localization | VGG-SS Open set (Unheard 110) | AP39.24 | 14 | |
| Audio-visual localization | VGG-SS Open set (Heard 110) | AP40.84 | 14 | |
| Visual Sound Source Localization | VGG-SS extended (test) | Localization Accuracy39.8 | 11 | |
| Audio referred image grounding | VGG-SS (test) | cIoU48.51 | 10 |