| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RiTTA (test) | AudioLDM (L-Full) | FAD5.47 | 11 | 1mo ago | |
| AudioCaps 2019 (test) | UNISON (D24, 16kHz) | FAD1.558 | 10 | 2d ago | |
| VGGSound-Omni (test) | Omni2Sound | KL Divergence1.35 | 10 | 3mo ago | |
| AudioCaps | AudioLDM 2 | FD (OpenL3)1.86 | 10 | 27d ago | |
| AudioSet Strong | T2A-Adapter | F1 Event54.36 | 9 | 3mo ago | |
| Downstream Audio Generation TTA | LoSATok | FAD1.987 | 8 | 6d ago | |
| Text-to-Audio (test) | T2A-Adapter | Loudness MAE1.4 | 7 | 3mo ago | |
| OpenBookQA (test) | SIFT | Accuracy54.3 | 6 | 1mo ago | |
| COSE (test) | BLUR | Accuracy53.6 | 6 | 1mo ago | |
| ESNLI (test) | SIFT | Accuracy79.6 | 6 | 1mo ago | |
| AudioBox | TangoFLUX | Clarity Score (CE)3.54 | 6 | 3mo ago | |
| AudioCaps multi-event prompts | TANGOFLUX | FDopenl375.2 | 5 | 3mo ago | |
| English brief-answers | Mini-Omni | Avg CER1.01 | 3 | 28d ago | |
| T2A Evaluation Set | AudioX | Overlap Score (OVL)81.5 | 3 | 1mo ago |