| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Classification | ESC-50 | Accuracy99.25 | 325 | |
| Audio Classification | ESC50 | Top-1 Acc96.5 | 64 | |
| Target Sound Extraction | ESC-50 (test) | SISDRi12.37 | 46 | |
| Environmental Sound Classification | ESC-50 (5-fold cross-validation) | Accuracy98.1 | 33 | |
| Environmental sound classification | ESC | Top-1 Acc91.8 | 28 | |
| Classification | ESC-50 (test) | Accuracy96.35 | 16 | |
| Environmental Sound Classification | ESC-50 (10-fold cross-validation) | Accuracy96.1 | 13 | |
| Audio Classification | ESC50 (In-Domain) | AI43.35 | 12 | |
| Zero-shot Audio Classification Explanation | ESC50 White Noise contamination | AI Score32.97 | 12 | |
| Audio Classification | ESC | Top-1 Accuracy95.5 | 10 | |
| Audio Classification | ESC-Actions | Accuracy91.5 | 10 | |
| Event Causality Identification | ESC cross-topic partition | Precision0.505 | 10 | |
| Audio Classification | ESC-50 (val) | Top-1 Acc99 | 10 | |
| Environmental Sound Classification | ESC-50 (incremental split (5 tasks)) | Accuracy50 | 6 | |
| Sound Event Classification | ESC-50 Simulated Distributed Layouts (five-fold cross-validation) | Accuracy (Circular)36.8 | 6 | |
| Audio Classification | ESC-50 500 labels | Top-1 Error Rate0.2592 | 6 | |
| Audio Classification | ESC-50 250 labels | Top-1 Error Rate29.71 | 6 | |
| Audio Classification | ESC50 | Base Score61.97 | 4 | |
| Action Generation | ESC | Overall Score4.82 | 3 | |
| Thought Generation | ESC | Overall Score4.87 | 3 | |
| Clue Generation | ESC | Overall Score4.83 | 3 | |
| Event Causality Identification | ESC (random partition) | Precision64.2 | 3 | |
| Dialogue Response Generation | ESC (test) | Accuracy88 | 3 | |
| Audio Event Classification | ESC-50 (evaluation) | Accuracy96.1 | 2 |