| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Single Sound Source Localization | MUSIC Solo (test) | IoU@0.598.5 | 26 | |
| Analysis-synthesis | Music Academic | FAD0 | 24 | |
| Audio-Visual Sound Separation | MUSIC-21 (test) | SDR10.36 | 24 | |
| Regression | music | Mean0.598 | 24 | |
| Generalization Performance | music | Avg Generalization Error0.21 | 24 | |
| Sound Separation | MUSIC-clean+ | CLAPt6.94 | 18 | |
| Audio Generation | Music clean (test) | Generation Success Rate100 | 18 | |
| Audio Generation | Music Noise SNR=-10 (test) | Generation Success Rate85 | 18 | |
| Hypernym discovery | music Gold standard domain-specific (test) | MRR80.6 | 18 | |
| Target Sound Extraction | MUSIC21 (test) | SDRi9.47 | 17 | |
| Audio source separation | MUSIC (test) | SDR11.61 | 16 | |
| Sequential Recommendation | Music | Recall14.02 | 14 | |
| Rating Prediction | Music unbiased (test) | AUC68.8 | 12 | |
| Analysis-synthesis | Music Industrial | FAD0 | 12 | |
| ECHO-related classification | MUSIC (test) | LVEF < 40% Classification68 | 12 | |
| Open-Ended Abstract Retrieval (OAR) | Music | Diversity49.2 | 11 | |
| Audio Restoration | Music Datasets (test) | Mel Distance1.2115 | 9 | |
| Multi-source sound localization | MUSIC-Duet | CIoU@0.332.5 | 9 | |
| Music Genre Classification | Music (test) | Accuracy84.06 | 9 | |
| SCD outcome prediction | MUSIC | AUROC0.6223 | 8 | |
| Video-to-audio generation | MUSIC (test) | Overall Score4.3 | 8 | |
| Audio-visual source localization | MUSIC-Solo (test) | cIoU53.78 | 7 | |
| Text-prompted separation | Music | SAJ4.45 | 7 | |
| Sound source separation | MUSIC | SDR8.82 | 7 | |
| Audio-visual source separation | MUSIC duets | SDR10.25 | 6 |