Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

About

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making. Specifically, the Vision and Language Navigation (VLN) task involves navigating to a goal purely from language instructions and visual information without explicit knowledge of the goal. Recent successful approaches have made in-roads in achieving good success rates for this task but rely on beam search, which thoroughly explores a large number of trajectories and is unrealistic for applications such as robotics. In this paper, inspired by the intuition of viewing the problem as search on a navigation graph, we propose to use a progress monitor developed in prior work as a learnable heuristic for search. We then propose two modules incorporated into an end-to-end architecture: 1) A learned mechanism to perform backtracking, which decides whether to continue moving forward or roll back to a previous state (Regret Module) and 2) A mechanism to help the agent decide which direction to go next by showing directions that are visited and their associated progress estimate (Progress Marker). Combined, the proposed approach significantly outperforms current state-of-the-art methods using greedy action selection, with 5% absolute improvement on the test server in success rates, and more importantly 8% on success rates normalized by the path length. Our code is available at https://github.com/chihyaoma/regretful-agent .

Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira• 2019

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)50
260
Vision-Language NavigationR2R (test unseen)
SR56
122
Vision-Language NavigationR2R (val seen)
Success Rate (SR)69
120
Vision-Language NavigationR2R Unseen (test)
SR48
116
Vision-and-Language NavigationRoom-to-Room (R2R) Unseen (val)
SR50
52
Vision-and-Language NavigationR4R unseen (val)
Success Rate (SR)19.2
52
Vision-and-Language NavigationR2R (test)
SPL (Success weighted Path Length)40
38
Vision-and-Language NavigationRoom-to-Room (R2R) Seen (val)
NE (Navigation Error)3.23
32
Vision-and-Language NavigationRoom-to-Room (R2R) (test unseen)
SR48
24
Vision-and-Language NavigationR6R unseen (val)
PL15.9
22
Showing 10 of 22 rows

Other info

Follow for update