CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

About

Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that integrates a reactive small-model planner (RSMP) with a reflective large-model reasoner (RLMR). RSMP adopts a causal-learning-based dual-branch architecture to enhance generalization, while RLMR leverages panoramic visual prompting with chain-of-thought reasoning to support interpretable spatial understanding and navigation. We further introduce an uncertainty-aware collaboration mechanism (UCM) that adaptively fuses decisions from both models. For obstacle avoidance, in simulation, we replace the rule-based controller with a fully learnable point-goal policy, and in real-world deployment, we design a LiDAR-based clustering module for generating navigable waypoints and pair it with an online SLAM-based local controller. CLASH achieves state-of-the-art (SoTA) results (ranking 1-st) on the VLN-CE leaderboard, significantly improving SR and SPL on the test-unseen set over the previous SoTA methods. Real-world experiments demonstrate CLASH's strong robustness, validating its effectiveness in both simulation and deployment scenarios.

Liuyi Wang, Zongtao He, Jinlong Li, Ruihao Xia, Mengxian Hu, Chenpeng Yao, Chengju Liu, Yang Tang, Qijun Chen• 2025

Related benchmarks

Task	Dataset	Result
Vision-Language Navigation	R2R-CE (val-unseen)	Success Rate (SR)65	677
Vision-and-Language Navigation	R2R-CE (val-seen)	SR73	79
Vision-and-Language Navigation	R2R-CE (test-unseen)	SR66	63
Vision-and-Language Navigation	R2R-CE v1.0 (val unseen)	SR (Success Rate)65	44
Vision-and-Language Navigation	REVERIE CE (val unseen)	NE6.82	8
Vision-and-Language Navigation	REVERIE-CE (val seen)	NE5.38	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord