| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-visual video parsing (Event-level) | LLP (test) | Acc (A)58.6 | 15 | |
| Audio-visual video parsing (Segment-level) | LLP (test) | Audio Score65.9 | 15 | |
| Audio-Visual Video Parsing | LLP 1.0 (test) | Segment-level Audio68.1 | 13 | |
| Audio-Visual Video Parsing | LLP (test) | Audio Segment Score63.8 | 11 | |
| Image Guided Audio Temporal Localization | LLP (test) | F1 Score54.96 | 5 | |
| Audio-Visual Event Parsing | LLP (test) | Audio Segment Score64.1 | 4 | |
| Audio-visual video parsing | LLP (val) | Segment-Level Audio Accuracy63.5 | 3 |