| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Classification | VGG-Sound | Top-1 Accuracy88.15 | 83 | |
| Audio Classification Attribution | VGG-Sound (val) | Deletion AUC14.56 | 28 | |
| Audio recognition | VGG-Sound (test) | Top-1 Acc64.1 | 22 | |
| Video-to-Audio Generation | VGG-Sound | Fréchet Distance (FD)0 | 10 | |
| Discovering the cause of incorrect predictions | VGG-Sound | Avg Highest Confidence (Top 25%)26.56 | 8 |