| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video-driven Audio Hallucination | AVHBench | Accuracy83.4 | 27 | |
| Cross-modal hallucination evaluation | AVHBench | Overall Accuracy88.19 | 22 | |
| Audiovisual Matching | AVHBench | Accuracy69.68 | 14 | |
| Audio-Visual Understanding | AVHBench | Overall Score81.7 | 8 | |
| Audio-Visual QA | AVHBench | Accuracy73.78 | 6 | |
| Audio-Visual Captioning | AVHBench | METEOR17.2 | 5 | |
| Audiovisual Understanding & Reasoning | AVHBench AVC | Score22.6 | 4 | |
| Audiovisual Understanding & Reasoning | AVHBench AVM | Score61.6 | 4 |