Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022
About
We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.
Maria Escobar, Laura Daza, Cristina Gonz\'alez, Jordi Pont-Tuset, Pablo Arbel\'aez• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object State Change Classification (OSCC) | Ego4D (test) | Accuracy68 | 13 | |
| Object State Change Classification | Ego4D (val) | Accuracy69.8 | 12 | |
| Point of No Return (PNR) | Ego4D (test) | PNR Error (s)0.66 | 10 |
Showing 3 of 3 rows