| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hand Pose Estimation | Ego4D HInt v1 (test) | PCK @ 0.0559.3 | 32 | |
| Long-term action anticipation | Ego4D v1 (test) | ED@Z=20 Verb0.679 | 31 | |
| State change classification | Ego4D v1 (test) | Accuracy75 | 29 | |
| Action Recognition | Ego4D v1 (test) | Top-1 Accuracy (Verb)25.1 | 23 | |
| Natural Language Queries | Ego4D NLQ (val) | Recall@1 (IoU=0.3)21.97 | 23 | |
| Video Grounding | Ego4D-NLQ v1 (test) | Recall@1 (IoU=0.3)20.63 | 21 | |
| Point-of-no-return temporal localization | Ego4D v1 (test) | Error0.61 | 21 | |
| Natural Language Queries | Ego4D NLQ (test) | R@1 (IoU=0.3)26.67 | 21 | |
| Temporal Grounding | Ego4D-NLQ (test) | R@1 (IoU=0.3)22.21 | 20 | |
| Moment Query | Ego4D Moment Query (val) | R@1 (IoU=0.5)51.04 | 19 | |
| Long-Term Anticipation | Ego4D LTA v1 (test) | ED@Z=20 Verb0.65 | 18 | |
| Verb Recognition | Ego4D | Top-1 Acc28.93 | 17 | |
| Noun Recognition | Ego4D | Top-1 Acc35.85 | 17 | |
| Short-Term Anticipation | Ego4D STA v2 (val) | N mAP37.41 | 16 | |
| Spatial-Temporal Anticipation | Ego4D STA v1, v2 (val) | Base Performance (B)55.98 | 14 | |
| Narrative Reasoning | Ego4D (test) | BLEURT0.48 | 14 | |
| Temporal Grounding | Ego4D-NLQ | R@1 (IoU=0.3)16.37 | 14 | |
| Action Recognition | MMG-Ego4D 1.0 (test) | Accuracy (5-way 5-shot)63 | 13 | |
| Object State Change Classification (OSCC) | Ego4D (test) | Accuracy75 | 13 | |
| Pronoun Coreference Resolution | Ego4D (test) | Accuracy52.7 | 12 | |
| Speaking Target Identification | Ego4D v1.0 (test) | Accuracy66.5 | 12 | |
| Mentioned Player Prediction | Ego4D (test) | Accuracy55.1 | 12 | |
| Object State Change Classification | Ego4D (val) | Accuracy76.2 | 12 | |
| Temporal Grounding | Ego4D Goalstep (test) | R@1 (Th=0.3)23.2 | 11 | |
| Moment Query | Ego4D (test) | R@1 (IoU=0.5)0.5007 | 11 |