Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

About

Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.

Zebin Han, Xudong Wang, Baichen Liu, Qi Lyu, Zhenduo Shang, Jiahua Dong, Lianqing Liu, Zhi Han• 2026

Related benchmarks

TaskDatasetResultRank
Iterative Vision-and-Language NavigationIR2R-CE (val seen)
TL12.3
15
Vision-and-Language NavigationIR2R-CE (val-unseen)
TL (Task Length Success Rate)11.4
9
Sequential-Horizon NavigationSH IR2R-CE (val-unseen)
TL (Trajectory Length Score)17.3
8
Sequential-Horizon NavigationSH IR2R-CE (val-seen)
Trajectory Length (TL)17.8
8
Showing 4 of 4 rows

Other info

Follow for update