Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

About

The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rate by 17.1% and SPL by 20.6% for the SOON benchmark.

Minyoung Hwang, Jaeyeon Jeong, Minsoo Kim, Yoonseon Oh, Songhwai Oh• 2023

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)72
260
Vision-and-Language NavigationREVERIE (val unseen)
SPL40.27
129
Vision-Language NavigationR2R (test unseen)
SR71
122
Vision-Language NavigationR2R (val seen)
Success Rate (SR)81
120
Vision-Language NavigationR2R Unseen (test)
SR71
116
NavigationREVERIE Unseen (test)
SR51.18
43
Vision-and-Language NavigationR2R (test)
SPL (Success weighted Path Length)61
38
Vision-and-Language NavigationREVERIE seen (val)
SR71.89
28
Vision-and-Language NavigationSOON unseen house (test)
Success Rate39.1
10
Vision-and-Language NavigationSOON seen house (val)
SR4.47e+3
9
Showing 10 of 18 rows

Other info

Code

Follow for update