Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

About

The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models. However, full fine-tuning the pre-trained model for every downstream VLN task is becoming costly due to the considerable model size. Recent research hotspot of Parameter-Efficient Transfer Learning (PETL) shows great potential in efficiently tuning large pre-trained models for the common CV and NLP tasks, which exploits the most of the representation knowledge implied in the pre-trained model while only tunes a minimal set of parameters. However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. Specifically, we design two PETL modules: Historical Interaction Booster (HIB) and Cross-modal Interaction Booster (CIB). Then we combine these two modules with several existing PETL methods as the integrated VLN-PETL. Extensive experimental results on four mainstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed VLN-PETL, where VLN-PETL achieves comparable or even better performance to full fine-tuning and outperforms other PETL methods with promising margins.

Yanyuan Qiao, Zheng Yu, Qi Wu• 2023

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)65.47
260
Vision-and-Language NavigationREVERIE (val unseen)
SPL27.67
129
Vision-Language NavigationR2R (test unseen)
SR63
122
Vision-Language NavigationR2R (val seen)
Success Rate (SR)72.28
120
Vision-Language NavigationR2R Unseen (test)
SR63.22
116
NavigationREVERIE Unseen (test)
SR30.83
43
Vision-and-Language NavigationR2R (test)
SPL (Success weighted Path Length)58
38
NavigationREVERIE (val unseen)
Success Rate (SR)31.81
34
Remote GroundingREVERIE Unseen (test)
RGS15.13
33
Vision-and-Language NavigationRxR (Room-Across-Room) unseen (val)
SR (Success Rate)57.95
26
Showing 10 of 21 rows

Other info

Code

Follow for update