Turning Adaptation into Assets: Cross-Domain Bridging for Online Vision-Language Navigation
About
Navigating under non-stationary environment shifts poses a critical challenge for a Vision-and-Language Navigation (VLN) agent deployed in the wild. Yet, existing Test-Time Adaptation (TTA) methods for VLN largely treat online adaptation as transient, isolated updates, leading to catastrophic forgetting and negative transfer. To overcome these issues, we propose Inter-Domain BridgE with Historical Assets (IDEA), a novel TTA framework that transforms adaptation into the accumulation and composition of assets. Specifically, IDEA introduces soft prompts optimized via a Fisher-guided weighting scheme to capture the transferable knowledge. These optimized prompts are then augmented with domain coordinates to form a dynamic asset library. Leveraging this library, IDEA constructs a cross-domain bridge by projecting the target domain onto the convex hull of historical knowledge. These designs form a complementary loop: the evolving library underpins bridge construction, while the bridge provides superior initialization to accelerate asset optimization. Extensive experiments across REVERIE, R2R, and R2R-CE benchmarks demonstrate the consistent superiority of IDEA over existing methods, showcasing its ability to enable training-free adaptation via asset sharing.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Vision-Language Navigation | R2R-CE (val-unseen) | Success Rate (SR)62 | 677 | |
| Vision-and-Language Navigation | R2R (val unseen) | Success Rate (SR)76 | 448 | |
| Vision-and-Language Navigation | REVERIE (val unseen) | SPL38.03 | 225 | |
| Vision-Language Navigation | R2R (val seen) | Success Rate (SR)83 | 150 | |
| Vision-and-Language Navigation | REVERIE Unseen (test) | Success Rate (SR)55.12 | 110 | |
| Vision-and-Language Navigation | R2R-CE (val-seen) | SR73 | 79 | |
| Vision-and-Language Navigation | REVERIE seen (val) | SR78.24 | 64 |