Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation

About

Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification'' across 5 core tasks: Point-Goal, Object-Goal, Instruction-Following, POI-Goal, and Person-Following. ABot-N0 utilizes a hierarchical ``Brain-Action'' architecture, pairing an LLM-based Cognitive Brain for semantic reasoning with a Flow Matching-based Action Expert for precise, continuous trajectory generation. To support large-scale learning, we developed the ABot-N0 Data Engine, curating 16.9M expert trajectories and 5.0M reasoning samples across 7,802 high-fidelity 3D scenes (10.7 $\text{km}^2$). ABot-N0 achieves new SOTA performance across 7 benchmarks, significantly outperforming specialized models. Furthermore, our Agentic Navigation System integrates a planner with hierarchical topological memory, enabling robust, long-horizon missions in dynamic real-world environments.

Zedong Chu, Shichao Xie, Xiaolong Wu, Yanfen Shen, Minghua Luo, Zhengbo Wang, Fei Liu, Xiaoxu Leng, Junjun Hu, Mingyang Yin, Jia Lu, Yingnan Guo, Kai Yang, Jiawei Han, Xu Chen, Yanqing Zhu, Yuxiang Zhao, Xin Liu, Yirong Yang, Ye He, Jiahang Wang, Yang Cai, Tianlin Zhang, Li Gao, Liu Liu, Mingchao Sun, Fan Jiang, Chiyu Wang, Zhicheng Liu, Hongyu Pan, Honglin Han, Zhining Gu, Kuan Yang, Jianfang Zhang, Di Jing, Zihao Guan, Wei Guo, Guoqing Liu, Di Yang, Xiangpo Yang, Menglin Yang, Hongguang Xing, Weiguo Li, Mu Xu• 2026

Related benchmarks

TaskDatasetResultRank
Vision-Language NavigationR2R-CE (val-unseen)
Success Rate (SR)66.4
677
Vision-Language NavigationRxR-CE (val-unseen)
SR69.3
426
Object Goal NavigationHM3D-OVON Seen (val)
SR55.3
65
Object Goal NavigationHM3D-OVON unseen (val)
Success Rate54
57
Object Goal NavigationHM3D-OVON Seen-Synonyms (val)
SR55.4
56
Vision-Language NavigationVLN-CE R2R (val unseen)
Navigation Error (NE)3.78
41
Person-FollowingEVT-Bench Single-Target Tracking (STT) single view
SR86.9
9
Person-FollowingEVT-Bench single view (Distracted Tracking)
SR66.7
9
Person-FollowingEVT-Bench Ambiguity Tracking (AT) single view
Success Rate (SR)67.3
8
Open-Vocabulary NavigationHM3D OVON
Success Rate (SR)54
8
Showing 10 of 13 rows

Other info

GitHub

Follow for update