| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AudioCaps (test) | AUDIOBOX SOUND | FAD0.77 | 138 | 4d ago | |
| Clotho (test) | Make-An-Audio | FID17.23 | 17 | 4d ago | |
| MusicCaps | Stable Audio | FDopenl3108.69 | 10 | 4d ago | |
| evaluation benchmarks one-to-one | TangoFlux | CLAP Score42.71 | 6 | 4d ago | |
| One-to-one evaluation benchmarks Text-to-Audio | TangoFlux | FAD1.41 | 6 | 4d ago | |
| AudioCaption (test) | CLAP Score0.526 | 6 | 4d ago | ||
| VGGSound | Siren + Plan-Critic | CLAP Score (Overall)35.88 | 5 | 4d ago | |
| RiTTA Count OOD (test) | TangoFlux-RAG | KL Divergence2.18 | 4 | 4d ago | |
| EpicBench T2A | Make an Audio 2 | EOS19.96 | 4 | 4d ago | |
| Human Evaluation Subjective Audio Assessment (test) | TANGOFLUX | Z-Score (OVL)0.2486 | 4 | 4d ago | |
| MusicCaps | Stable Audio | Structure: Intro92.1 | 4 | 4d ago | |
| AudioSet (eval) | AudioLDM-L-Full | FD24.26 | 4 | 4d ago | |
| Kling-Audio Eval | Omni2Sound | KL Divergence2.36 | 3 | 4d ago | |
| AudioCaps 5s (test) | Make-An-Audio 2 | FD (Fréchet Distance)13.78 | 3 | 4d ago | |
| Clotho eval 10s (test) | Make-An-Audio 2 | FD19.97 | 3 | 4d ago | |
| AudioCaps | Stable Audio | Stereo Correctness0.57 | 1 | 4d ago |