Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation

About

In vision-and-language navigation (VLN), self-improvement from policy-induced experience, using only standard VLN action supervision, critically depends on balancing behavioral diversity and learning stability, which governs whether the agent can extract a reliable learning signal for improvement. Increasing behavioral diversity is necessary to expose alternative action hypotheses but can destabilize policy-induced learning signals, whereas overly conservative stability constraints suppress exploration and induce early commitment, making reliable self-improvement difficult. To address this challenge, we propose Stability-Diversity Balance (SDB), a plug-and-play mechanism for balanced self-improvement in VLN. SDB expands each decision step into multiple latent behavioral hypotheses by applying controlled shifts in the instruction-conditioned hidden states, and then performs reliability-aware soft evaluation and aggregation to retain diverse yet instruction-consistent alternatives during learning. An explicit regularizer further constrains hypothesis interactions, preventing excessive drift or premature collapse of hypothesis diversity and stabilizing self-improvement without discarding training signals. Experiments on R2R, SOON, and REVERIE show consistent improvements; for example, on REVERIE val-unseen, SDB improves SPL from 33.73 to 35.93 and OSR from 51.07 to 54.25.

Zhen Liu, Yuhan Liu, Jinjun Wang, Jianyi Liu, Wei Song, Jingwen Fu• 2026

Related benchmarks

TaskDatasetResultRank
Vision-and-Language NavigationR2R (val unseen)
Success Rate (SR)72
448
Vision-and-Language NavigationREVERIE (val unseen)
SPL35.93
225
Vision-Language NavigationR2R (test unseen)
SR70
149
Vision-and-Language NavigationREVERIE Unseen (test)
Success Rate (SR)54.92
110
Vision-and-Language NavigationREVERIE seen (val)
SR69.5
64
Vision-and-Language NavigationSOON (val unseen)
SPL25
31
Vision-Language NavigationSOON (test-unseen)
OSR39.29
7
Showing 7 of 7 rows

Other info

Follow for update