| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Classification | AudioSet 20K | mAP47.8 | 147 | |
| Anti-steganalysis | Audioset | P_E49.94 | 99 | |
| Audio Classification | AudioSet 2M | mAP50.5 | 98 | |
| Audio Reconstruction | AudioSet (eval) | Mel Distance0.382 | 63 | |
| 1D audio reconstruction | AudioSet | NMSE0.006 | 63 | |
| Audio Classification | AudioSet | mAP49.6 | 60 | |
| Classification | AudioSet (test) | mAP49.6 | 57 | |
| Audio Event Tagging | AudioSet AS-2M (full) | mAP50.2 | 45 | |
| Sound Classification | AudioSet (evaluation) | mAP47.1 | 39 | |
| Acoustic event detection | AudioSet (test) | mAP0.462 | 34 | |
| Audio Classification | AudioSet-2M (full) | mAP48.6 | 32 | |
| Audio Tagging | AudioSet (test) | mAP50 | 25 | |
| Audio Event Tagging | AudioSet (AS-20K) | mAP46.7 | 24 | |
| Audio Reconstruction | AudioSet (test) | Mel Distance (16kHz)0.32 | 23 | |
| Audio Classification | AudioSet Full (test) | mAP45.9 | 23 | |
| Classification | AudioSet AS-2M | mAP (%)50.2 | 21 | |
| Audio Classification | AudioSet 20k (train test) | mAP31.67 | 19 | |
| Generalized Zero-Shot Retrieval (Text-to-Audio) | AudioSet ZSL (test) | mAP (S)72.25 | 19 | |
| Sound Event Detection | AudioSet Strongly-labeled (test) | PSDS1 (w/o var-pen)0.374 | 18 | |
| Audio-visual event classification | AudioSet 2M | mAP (Audio-only)49.1 | 16 | |
| Generalized Zero-Shot Classification | AudioSet ZSL (test) | mAcc (Seen)50.96 | 16 | |
| Audio Generation | AudioSet AAR 20k | Minimum LSD0 | 15 | |
| Autoregressive audio (AAR) | AudioSet 20k (subset of 100 random 10 s clips) | Compression Ratio2.75 | 15 | |
| Multi-class Music Classification | AudioSet | Accuracy58.71 | 14 | |
| Audio Tagging | AudioSet balanced (AS-20k) | mAP40.2 | 14 |