| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Watermarking Attribution | MusicCaps | Accuracy (Att.) (%)100 | 352 | |
| Audio Watermark Attribution | MusicCaps (test) | Attribution Accuracy100 | 85 | |
| Audio Watermark Detection | MusicCaps (test) | Detection Accuracy100 | 85 | |
| Audio Watermark Detection | MusicCaps balanced (val) | Accuracy100 | 85 | |
| Text-to-Music Generation | MusicCaps (evaluation set) | FAD2.18 | 20 | |
| Music Generation | MusicCaps (test) | FAD0 | 16 | |
| Music-to-Text Retrieval | MusicCaps | R@124.6 | 12 | |
| Text-to-Music Generation | MusicCaps | KLD1.01 | 11 | |
| Music Generation | MusicCaps | FAD1.12 | 11 | |
| Text-to-Audio Generation | MusicCaps | FDopenl3108.69 | 10 | |
| Audio Captioning | MusicCaps | Captioning Score23.33 | 8 | |
| Music Generation | MusicCaps (full) | Aes8.26 | 8 | |
| Music Captioning | MusicCaps (test) | METEOR23.4 | 8 | |
| Text-to-Music Generation | MusicCaps unbalanced (test) | FAD2 | 7 | |
| Music Reconstruction | MusicCaps | VISQOL Score4.06 | 6 | |
| Text-to-Music Generation | MusicCaps genre-balanced (test) | T2M-QLT85.7 | 6 | |
| Music Generation | MusicCaps 2023 (test) | FADVGG2.134 | 5 | |
| Audio Generation Quality | MusicCaps MusicGen 32kHz (val) | FAD (VGGish)0.247 | 4 | |
| Music Generation | MusicCaps 25s-long clips | FD (OpenL3)85.2023 | 4 | |
| Music Generation | MusicCaps 10s-long clips | FD (OpenL3)74.4559 | 4 | |
| Audio Captioning | MusicCaps (MC) non-vocal | SBERT Similarity0.478 | 4 | |
| Text-to-Audio Generation | MusicCaps | Structure: Intro92.1 | 4 | |
| Text-to-Music Retrieval | MusicCaps | R@16.69 | 4 | |
| Music Retrieval | MusicCaps | Precision98 | 3 | |
| Text-to-Music Generation | MusicCaps | CLAP Similarity (Benign, User Question)0.33 | 3 |