| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Omnimodal Question Answering | OmniVideoBench 1.0 (test) | Compare Attr44.44 | 18 | |
| Audio-visual Question Answering | OmniVideoBench | Accuracy0.356 | 18 | |
| Audio-Visual Joint Reasoning | OmniVideoBench | Music Score56.2 | 11 | |
| Audio-Video Understanding | OmniVideoBench | Latency (0-1s Bin)28.92 | 9 |