| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Action Recognition | EPIC-SOUNDS | Top-1 Accuracy78.2 | 17 | |
| Audio-to-Text Retrieval | EPIC-Sounds | mAP50.3 | 8 | |
| Text-to-Audio Retrieval | EPIC-Sounds | mAP17 | 8 | |
| Audio-Visual Action Recognition | HD-EPIC-SOUNDS | Top-1 Accuracy31.9 | 7 | |
| Audio-visual Recognition | EPIC-SOUNDS (test val) | Accuracy61 | 5 | |
| Audio Recognition | EPIC-Sounds (test) | Top-1 Accuracy55.9 | 5 | |
| Sound Recognition | EPIC-SOUNDS (val) | Top-1 Accuracy58.3 | 5 | |
| Audio Detection | EPIC-Sounds (test) | AP@0.115.7 | 2 |