Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Vision-and-Language Navigation via Causal Learning

About

In the pursuit of robust and generalizable environment perception and language understanding, the ubiquitous challenge of dataset bias continues to plague vision-and-language navigation (VLN) agents, hindering their performance in unseen environments. This paper introduces the generalized cross-modal causal transformer (GOAT), a pioneering solution rooted in the paradigm of causal inference. By delving into both observable and unobservable confounders within vision, language, and history, we propose the back-door and front-door adjustment causal learning (BACL and FACL) modules to promote unbiased learning by comprehensively mitigating potential spurious correlations. Additionally, to capture global confounder features, we propose a cross-modal feature pooling (CFP) module supervised by contrastive learning, which is also shown to be effective in improving cross-modal representations during pre-training. Extensive experiments across multiple VLN datasets (R2R, REVERIE, RxR, and SOON) underscore the superiority of our proposed method over previous state-of-the-art approaches. Code is available at https://github.com/CrystalSixone/VLN-GOAT.

Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen• 2024

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)78
344
Vision-and-Language NavigationREVERIE (val unseen)
SPL36.7
173
Vision-Language NavigationR2R Unseen (test)
SR74.57
134
Vision-and-Language NavigationR2R (val seen)
Success Rate (SR)83.74
68
Vision-and-Language NavigationREVERIE Unseen (test)
Success Rate (SR)57.72
59
Vision-and-Language NavigationRxR (Room-Across-Room) unseen (val)
SR (Success Rate)68.2
32
Vision-and-Language NavigationREVERIE seen (val)
SR78.64
28
Vision-and-Language NavigationSOON (val unseen)
SPL28.1
25
Vision-and-Language NavigationRxR seen (val)
SR74.1
21
Vision-and-Language NavigationSOON Unseen (test)
SR40.5
9
Showing 10 of 10 rows

Other info

Code

Follow for update