Visual Semantic Navigation using Scene Priors

About

How do humans navigate to target objects in novel scenes? Do we use the semantic/functional priors we have built over years to efficiently search and navigate? For example, to search for mugs, we search cabinets near the coffee machine and for fruits we try the fridge. In this work, we focus on incorporating semantic priors in the task of semantic navigation. We propose to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework. The agent uses the features from the knowledge graph to predict the actions. For evaluation, we use the AI2-THOR framework. Our experiments show how semantic knowledge improves performance significantly. More importantly, we show improvement in generalization to unseen scenes and/or objects. The supplementary video can be accessed at the following link: https://youtu.be/otKjuO805dE .

Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi• 2018

Related benchmarks

Task	Dataset	Result
Visual Navigation	AI2-THOR Unseen Scenes (L >= 5) (test)	SPL11.37	11
Visual Navigation	AI2-THOR Unseen Scenes (All) (test)	SPL15.47	7
SLICE	AI2-iTHOR (test)	SLICE Task Success Rate36	5
COOL	AI2-iTHOR (test)	Task Success Rate14	5
ObjectNav	iTHOR Seen class (14/8)	Success Rate79.3	5
PREP	AI2-iTHOR (test)	Task Success Rate26	5
Robotic object search	AI2-THOR Seen Env., Seen Goals 1.0	Success Rate62	5
Robotic object search	AI2-THOR 1.0 (Seen Env., Unseen Goals)	Success Rate (SR)48	5
Robotic object search	AI2-THOR Unseen Env., Seen Goals 1.0	Success Rate (SR)56	5
Robotic object search	AI2-THOR Unseen Env., Unseen Goals 1.0	Success Rate (SR)49	5

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord