| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-Visual Sound Segmentation | AVISeg (test) | FSLA44.12 | 12 | |
| Audio-Visual Instance Segmentation | AVISeg | FSLA54.65 | 8 | |
| Video Instance Segmentation | AVISeg (test) | FSLA42.78 | 7 | |
| Audio-referred visual grounding | AVISeg (test) | FSLA18.55 | 4 |