| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Representation Evaluation | HEAR (Holistic Evaluation of Audio Representations) | CREMA-D76.7 | 35 | |
| Scene-based Audio Classification | HEAR Environmental Sound tasks | ESC-50 Accuracy78.9 | 5 | |
| Scene-based Audio Classification | HEAR Speech tasks | CREMA-D Score0.656 | 5 | |
| Audio Scene Classification | HEAR Music 2021 | Beijing0.966 | 5 |