| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Temporal Attribution | Audio | I(100)74.3 | 13 | |
| Negative temporal attribution | Audio | Δŷc(2%)-0.25 | 13 | |
| Analysis-synthesis | Audio Industrial | FAD0 | 12 | |
| Audio Compression Quality Assessment | Audio 24kHz | Speech Quality Score94.65 | 12 | |
| Boolean Matrix Factorization Completion | audio missing entries | Objective Value Improvement41 | 9 | |
| Audio Coding | Audio 16kHz 22kHz (test) | Bitrate (kbps)0.7 | 8 | |
| Generation Success Rate | Audio suite (test) | Gauss Success Rate75.22 | 6 | |
| Inference Speed | Audio 120s (test) | Inference Time (ms)479 | 5 | |
| Inference Speed | Audio 60s (test) | Inference Time (ms)244 | 5 | |
| Inference Speed | Audio 30s (test) | Inference Time (ms)125 | 5 | |
| Inference Speed | Audio 10s (test) | Inference Time (ms)42 | 5 | |
| Audio Quality Evaluation | Audio Evaluation Set | ESTOI43 | 5 | |
| Density Estimation | Audio Twenty Datasets (test) | Log-LH-39.74 | 4 | |
| Text-to-Audio Classification | audio 2024 (test) | Species Top-1 Accuracy27.7 | 2 | |
| Audio-to-Text Classification | audio 2024 (test) | Species Top-1 Accuracy24.4 | 2 | |
| Audio Reconstruction Quality | 48 kHz audio (test) | STOI0.996 | 1 |