| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Voiced/Unvoiced Detection | Speech | V/UV Recall94.21 | 50 | |
| Audio Generation | Speech clean (test) | Generation Success Rate100 | 18 | |
| Audio Generation | Speech Noise SNR=-10 (test) | Success Rate80 | 18 | |
| Anomaly Detection | Speech | AUC-ROC0.6073 | 16 | |
| Tabular Anomaly Detection | Speech | AUC-ROC0.676 | 14 | |
| Anomaly Detection | Speech ODDS | AUC62.4 | 12 | |
| Speech Synthesis | Speech Industrial Setting | MOS Prediction4.29 | 11 | |
| Speech Synthesis | Speech Academic Setting | MOS Prediction3.65 | 11 | |
| Speech Separation | Speech (test) | SI-SDRi18.8 | 11 | |
| Fundamental Frequency Estimation | Speech SNR 0 dB | RPA5013.66 | 10 | |
| Fundamental Frequency Estimation | Speech SNR 10 dB | RPA5068.85 | 10 | |
| Fundamental Frequency Estimation | Speech SNR 20 dB | RPA5078.01 | 10 | |
| Fundamental Frequency Estimation | Speech SNR 30 dB | RPA5080.24 | 10 | |
| Fundamental Frequency Estimation | Speech SNR ∞ | RPA5080.91 | 10 | |
| Text-prompted separation | Speech | SAJ4.67 | 9 | |
| Audio Reconstruction | Speech | MUSHRA90.5 | 6 | |
| Audio Quality Assessment | Speech | PCC Overall0.883 | 5 | |
| Speech to Sound generation | Speech-S | WER (%)6.15 | 3 | |
| Audio-to-Text Retrieval | Speech (test) | R@10.51 | 3 | |
| Text-to-Audio Retrieval | Speech (test) | R@17.1 | 3 |