| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-form Video Understanding | LVU | Relation Attribute Accuracy76.47 | 44 | |
| Long-form Video Understanding | LVU (test) | Relation Top-1 Acc67.11 | 16 | |
| Long-form Video Understanding | LVU 1.0 (test) | Director Accuracy78.4 | 14 | |
| Video Question Answering | LVU | Accuracy76.1 | 13 | |
| Long Video Understanding (Classification & Regression) | LVU 53 (test) | Place Accuracy68.2 | 10 | |
| Long-form Video Classification | LVU | Relation Accuracy71.2 | 10 | |
| Scene classification | LVU | Accuracy0.68 | 4 | |
| Way of speaking classification | LVU | Accuracy41.3 | 4 | |
| Relation classification | LVU | Accuracy61.5 | 4 |