| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-visual understanding | Daily-Omni | Accuracy82.8 | 27 | |
| QA performance by Gemini-2.5-Pro based on captions | Daily-Omni (test) | Daily-Omni QA Score61.2 | 13 | |
| Video Question Answering | Daily-Omni | Score60.2 | 11 | |
| Audio-Visual Perception | Daily-Omni | Score60.65 | 8 | |
| Audiovisual Understanding & Reasoning | Daily-Omni | Score77.9 | 6 |