The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation

About

In vision-and-language navigation (VLN), self-improvement from policy-induced experience, using only standard VLN action supervision, critically depends on balancing behavioral diversity and learning stability, which governs whether the agent can extract a reliable learning signal for improvement. Increasing behavioral diversity is necessary to expose alternative action hypotheses but can destabilize policy-induced learning signals, whereas overly conservative stability constraints suppress exploration and induce early commitment, making reliable self-improvement difficult. To address this challenge, we propose Stability-Diversity Balance (SDB), a plug-and-play mechanism for balanced self-improvement in VLN. SDB expands each decision step into multiple latent behavioral hypotheses by applying controlled shifts in the instruction-conditioned hidden states, and then performs reliability-aware soft evaluation and aggregation to retain diverse yet instruction-consistent alternatives during learning. An explicit regularizer further constrains hypothesis interactions, preventing excessive drift or premature collapse of hypothesis diversity and stabilizing self-improvement without discarding training signals. Experiments on R2R, SOON, and REVERIE show consistent improvements; for example, on REVERIE val-unseen, SDB improves SPL from 33.73 to 35.93 and OSR from 51.07 to 54.25.

Zhen Liu, Yuhan Liu, Jinjun Wang, Jianyi Liu, Wei Song, Jingwen Fu• 2026

Related benchmarks

Task	Dataset	Result
Vision-and-Language Navigation	R2R (val unseen)	Success Rate (SR)72	476
Vision-and-Language Navigation	REVERIE (val unseen)	SPL35.93	237
Vision-Language Navigation	R2R (test unseen)	SR70	162
Vision-and-Language Navigation	REVERIE Unseen (test)	Success Rate (SR)54.92	110
Vision-and-Language Navigation	REVERIE seen (val)	SR69.5	64
Vision-and-Language Navigation	SOON (val unseen)	SPL25	37
Vision-Language Navigation	SOON (test-unseen)	OSR39.29	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord