Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

About

Recent work in Vision-and-Language Navigation (VLN) has presented two environmental paradigms with differing realism -- the standard VLN setting built on topological environments where navigation is abstracted away, and the VLN-CE setting where agents must navigate continuous 3D environments using low-level actions. Despite sharing the high-level task and even the underlying instruction-path data, performance on VLN-CE lags behind VLN significantly. In this work, we explore this gap by transferring an agent from the abstract environment of VLN to the continuous environment of VLN-CE. We find that this sim-2-sim transfer is highly effective, improving over the prior state of the art in VLN-CE by +12% success rate. While this demonstrates the potential for this direction, the transfer does not fully retain the original performance of the agent in the abstract setting. We present a sequence of experiments to identify what differences result in performance degradation, providing clear directions for further improvement.

Jacob Krantz, Stefan Lee• 2022

Related benchmarks

Task	Dataset	Result
Vision-Language Navigation	R2R-CE (val-unseen)	Success Rate (SR)43	779
Vision-Language Navigation	RxR-CE (val-unseen)	SR26.5	512
Vision-and-Language Navigation	R2R (val unseen)	Success Rate (SR)43	476
Vision-and-Language Navigation	R2R-CE (val-seen)	SR52	103
Vision-Language Navigation	VLN-CE R2R (val unseen)	Navigation Error (NE)6.07	76
Vision-and-Language Navigation	R2R-CE (test-unseen)	SR44	63
Vision-and-Language Navigation	R2R-CE v1.0 (val unseen)	SR (Success Rate)43	61
Vision-and-Language Navigation	R2R-CE unseen continuous (val)	SR43	35
Vertical Perception	NavSpace	Navigation Error (NE)6.72	30
Precise Movement	NavSpace	Navigation Error (NE)7.46	27

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord