Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EGO-TOPO: Environment Affordances from Egocentric Video

About

First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.

Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman• 2020

Related benchmarks

TaskDatasetResultRank
UnweavingEPIC-KITCHENS activity-story (test)
RI (2)66.2
8
Natural Language QueryMP3D 6 (val)
Rank-1 Success @ 0.336.1
8
Natural Language QueryHouseTours 7 (val)
Rank@1 (Thresh 0.3)43.36
8
Room PredictionMP3D (val)
Accuracy41.19
8
Room PredictionHouseTours (val)
Accuracy58.05
8
Room PredictionEgo4D
Accuracy49.42
7
Natural Language QueryEgo4D 26 (val)
Rank-1 @ IoU 0.35.45
7
Showing 7 of 7 rows

Other info

Follow for update