SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

About

Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.

Zebin Han, Xudong Wang, Baichen Liu, Qi Lyu, Zhenduo Shang, Jiahua Dong, Lianqing Liu, Zhi Han• 2026

Related benchmarks

Task	Dataset	Result
Iterative Vision-and-Language Navigation	IR2R-CE (val seen)	TL12.3	15
Vision-and-Language Navigation	IR2R-CE (val-unseen)	TL (Task Length Success Rate)11.4	9
Sequential-Horizon Navigation	SH IR2R-CE (val-unseen)	TL (Trajectory Length Score)17.3	8
Sequential-Horizon Navigation	SH IR2R-CE (val-seen)	Trajectory Length (TL)17.8	8

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord