| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-Visual Segmentation | AVSBench S4 v1 (test) | MJ86.2 | 55 | |
| Audio-Visual Segmentation | AVSBench MS3 v1 (test) | Mean Jaccard67.6 | 37 | |
| Audio-Visual Segmentation | AVSBench MS3 (test) | Jaccard Index (IoU)65 | 30 | |
| Audio-Visual Semantic Segmentation | AVSBench AVSS v1 (test) | MJ51.2 | 29 | |
| Sound Target Segmentation | AVSBench-object MS3 1.0 (test) | mIoU59.2 | 23 | |
| Audio-Visual Segmentation | AVSBench AVS-Objects-MS3 | J & F Score75.1 | 21 | |
| Audio-Visual Segmentation | AVSBench AVS-Objects-S4 | J&F Score92.4 | 21 | |
| Audio-Visual Segmentation | AVSBench S4 (test) | MJ81.9 | 16 | |
| Audio-Visual Segmentation | AVSBench AVS-Semantic | J (Jaccard)49.7 | 13 | |
| Sound Source Segmentation | AVSBench | mIoU36.37 | 10 | |
| Audio referred image grounding | AVSBench (test) | cIoU79.82 | 10 | |
| Audio-Visual Semantic Segmentation | AVSBench Semantic (test) | mIoU50.1 | 8 | |
| Audio-Visual Segmentation | AVSBench MS3 setting (test) | MJ Score55.1 | 6 | |
| Audio-visual segmentation | AVSBench Multi-source | mIoU63.5 | 5 | |
| Audio-visual segmentation | AVSBench Single-source | mIoU89.4 | 5 | |
| Audio-Visual Segmentation | AVSBench Single Source (test) | mIoU34.13 | 5 | |
| Audio-Visual Segmentation | AVSBench S4 | mIoU80.4 | 5 |