Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

About

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent was tasked with navigating from source to target location as quickly as possible. While all current approaches make local action decisions or score entire trajectories using beam search, ours balances local and global signals when exploring an unobserved environment. Importantly, this lets us act greedily but use global signals to backtrack when necessary. Applying FAST framework to existing state-of-the-art models achieved a 17% relative gain, an absolute 6% gain on Success rate weighted by Path Length (SPL).

Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa• 2019

Related benchmarks

Task	Dataset	Result
Vision-and-Language Navigation	R2R (val unseen)	Success Rate (SR)63	448
Vision-Language Navigation	R2R (val seen)	Success Rate (SR)70	150
Vision-Language Navigation	R2R (test unseen)	SR61	149
Vision-Language Navigation	R2R Unseen (test)	SR61	144
Vision-and-Language Navigation	R4R unseen (val)	Success Rate (SR)13.3	60
Vision-and-Language Navigation	Room-to-Room (R2R) Unseen (val)	SR63	52
Navigation	REVERIE Unseen (test)	SR14.18	43
Navigation	REVERIE (val unseen)	Success Rate (SR)10.08	34
Remote Grounding	REVERIE Unseen (test)	RGS7.07	33
Vision-and-Language Navigation	Room-to-Room (R2R) Seen (val)	NE (Navigation Error)3.13	32

Showing 10 of 23 rows

Other info

Code

Follow for update

@wizwand_team Discord