| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Music Generation | MusicCaps (evaluation set) | FAD2.18 | 20 | |
| Text-to-Music Generation | MusicCaps | KLD1.01 | 11 | |
| Music Generation | MusicCaps | FAD1.12 | 11 | |
| Music Generation | MusicCaps (test) | FAD1.12 | 10 | |
| Text-to-Audio Generation | MusicCaps | FDopenl3108.69 | 10 | |
| Music Generation | MusicCaps (full) | Aes8.26 | 8 | |
| Text-to-Music Generation | MusicCaps unbalanced (test) | FAD2 | 7 | |
| Music Reconstruction | MusicCaps | VISQOL Score4.06 | 6 | |
| Text-to-Music Generation | MusicCaps genre-balanced (test) | T2M-QLT85.7 | 6 | |
| Music Captioning | MusicCaps (test) | Relevance5.77 | 5 | |
| Music Generation | MusicCaps 2023 (test) | FADVGG2.134 | 5 | |
| Music Generation | MusicCaps 25s-long clips | FD (OpenL3)85.2023 | 4 | |
| Music Generation | MusicCaps 10s-long clips | FD (OpenL3)74.4559 | 4 | |
| Audio Captioning | MusicCaps (MC) non-vocal | SBERT Similarity0.478 | 4 | |
| Text-to-Audio Generation | MusicCaps | Structure: Intro92.1 | 4 | |
| Text-to-Music Retrieval | MusicCaps | R@16.69 | 4 | |
| Music-to-Text Retrieval | MusicCaps | R@16.37 | 4 | |
| Audio-Visual Retrieval | MusicCaps (test) | Recall@120.4 | 2 | |
| Music Understanding | MusicCaps | CLAP Score0.16 | 2 | |
| Text-to-Music Generation | MusicCaps (test) | REL (General Audience)4.09 | 1 |