| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-text retrieval | MECAT | Recall@10.2545 | 13 | |
| Text-to-Audio Retrieval | MECAT (test) | Recall@18.02 | 13 | |
| Mixed-Audio Generation | MECAT Speech + Audio (S0A) | FADVGG30.38 | 10 | |
| Audio Generation | MECAT S00 | FADVGG26.74 | 5 | |
| Audio Generation | MECAT 0M0 | FADVGG21.68 | 5 | |
| Audio Generation | MECAT 00A | FADVGG51.42 | 5 | |
| Mixed-Audio Generation | MECAT Speech + Music (SM0) | FADVGG19.83 | 5 | |
| Mixed-Audio Generation | MECAT 0MA Music + Audio | FADVGG Score3.25 | 5 | |
| Audio Captioning | MECAT | FENSE Content Long Score60.11 | 3 |