| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Analysis-synthesis | Music Academic | FAD0 | 24 | |
| Audio-Visual Sound Separation | MUSIC-21 (test) | SDR10.36 | 24 | |
| Regression | music | Mean0.598 | 24 | |
| Generalization Performance | music | Avg Generalization Error0.21 | 24 | |
| Sound Separation | MUSIC-clean+ | CLAPt6.94 | 18 | |
| Audio Generation | Music clean (test) | Generation Success Rate100 | 18 | |
| Audio Generation | Music Noise SNR=-10 (test) | Generation Success Rate85 | 18 | |
| Hypernym discovery | music Gold standard domain-specific (test) | MRR80.6 | 18 | |
| Target Sound Extraction | MUSIC21 (test) | SDRi9.47 | 17 | |
| Sequential Recommendation | Music | Recall14.02 | 14 | |
| Analysis-synthesis | Music Industrial | FAD0 | 12 | |
| ECHO-related classification | MUSIC (test) | LVEF < 40% Classification68 | 12 | |
| Single Sound Source Localization | MUSIC Solo (test) | IoU@0.562.1 | 10 | |
| Multi-source sound localization | MUSIC-Duet | CIoU@0.332.5 | 9 | |
| Music Genre Classification | Music (test) | Accuracy84.06 | 9 | |
| SCD outcome prediction | MUSIC | AUROC0.6223 | 8 | |
| Video-to-audio generation | MUSIC (test) | Overall Score4.3 | 8 | |
| Audio source separation | MUSIC (test) | SDR7.64 | 8 | |
| Audio-visual source localization | MUSIC-Solo (test) | cIoU53.78 | 7 | |
| Text-prompted separation | Music | SAJ4.45 | 7 | |
| Sound source separation | MUSIC | SDR8.82 | 7 | |
| Audio-visual source separation | MUSIC duets | SDR10.25 | 6 | |
| Audio-visual source separation | MUSIC solos | SDR9.04 | 6 | |
| Audio Quality Assessment | Music | PCC Overall0.815 | 5 | |
| Audio Reconstruction | Music | MUSHRA Score86.6 | 5 |