| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-Visual Question Answering | AVQA | Accuracy92 | 14 | |
| Audio Visual Question Answering | AVQA (test) | Total Accuracy93.8 | 13 | |
| Audio-Visual Question Answering | AVQA (val) | Existence Accuracy88.24 | 9 | |
| Audio-Visual Question Answering | AVQA (subset 2000 samples) | ASR Accuracy96.03 | 7 | |
| Audio Visual Question Answering | AVQA | AVQA Clean Accuracy95.6 | 7 | |
| Audio-Visual Question Answering | AVQA 69 (test) | Accuracy93.8 | 5 |