| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Event Classification (V → A) | VGGSound-AVEL 90K | Precision50.8 | 11 | |
| Cross-modal classification (Audio to Visual) | VGGSound-AVEL UCF to VGG 40K | Precision66.5 | 4 | |
| Cross-dataset domain transfer (Visual to Audio) | VGGSound-AVEL 40K AVE to AVVP | Segment-level F156.3 | 4 |