| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | EgoSchema (Full) | Accuracy75 | 193 | |
| Video Question Answering | EgoSchema | Accuracy77.2 | 88 | |
| Video Question-Answering | EgoSchema (test) | Accuracy77.9 | 80 | |
| Video Question Answering | EgoSchema subset | Accuracy75.4 | 73 | |
| Multiple Choice Video Question Answering | EgoSchema | Accuracy72.2 | 61 | |
| Video Question Answering | EgoSchema 500-question subset | Accuracy71.2 | 50 | |
| Video Understanding | EgoSchema | Accuracy69.5 | 49 | |
| Egocentric Video Understanding | EgoSchema | Subset Accuracy63.4 | 39 | |
| Long-form Video Understanding | EgoSchema | Accuracy72.2 | 38 | |
| Video Understanding | EgoSchema (test) | Accuracy77.9 | 34 | |
| Video Question Answering | EgoSchema 5031 videos (test) | Top-1 Accuracy62.4 | 26 | |
| Long-form Egocentric Video Understanding | EgoSchema | Accuracy78.2 | 25 | |
| Multi-choice Video Question Answering | EgoSchema (test) | Accuracy72.2 | 19 | |
| Multiple-Choice Video QA | EgoSchema latest (test) | Accuracy72.2 | 17 | |
| Long Video Question Answering | EgoSchema (full set) | Accuracy55.6 | 17 | |
| Egocentric Video Question Answering | EgoSchema (public leaderboard) | Accuracy75 | 13 | |
| Video Question Answering | EgoSchema Zero-shot | Accuracy81.8 | 11 | |
| Multi-choice Video Question Answering | EgoSchema Subset 500 questions | Accuracy66.4 | 10 | |
| Question Answering | EgoSchema | Accuracy58.4 | 9 | |
| Long-form first-person temporal reasoning | EgoSchema (test) | Accuracy51.2 | 9 | |
| Multi-choice Video Question Answering | EgoSchema (subset) | Accuracy68.6 | 7 | |
| Long-form Video Question Answering | EgoSchema EGO4D | QA Accuracy67.2 | 7 | |
| Video Question Answering | EgoSchema (val) | Accuracy81.6 | 6 | |
| Video Question Answering | EgoSchema 2023 (test) | Accuracy58.55 | 6 | |
| Long-form Video Question Answering | EgoSchema | Accuracy76.2 | 6 |