Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

About

Vision--language--action (VLA) models have recently shown promising performance on a variety of embodied tasks, yet they still fall short in reliability and generalization, especially when deployed across different embodiments or real-world environments. In this work, we introduce NORA-1.5, a VLA model built from the pre-trained NORA backbone by adding to it a flow-matching-based action expert. This architectural enhancement alone yields substantial performance gains, enabling NORA-1.5 to outperform NORA and several state-of-the-art VLA models across both simulated and real-world benchmarks. To further improve robustness and task success, we develop a set of reward models for post-training VLA policies. Our rewards combine (i) an action-conditioned world model (WM) that evaluates whether generated actions lead toward the desired goal, and (ii) a deviation-from-ground-truth heuristic that distinguishes good actions from poor ones. Using these reward signals, we construct preference datasets and adapt NORA-1.5 to target embodiments through direct preference optimization (DPO). Extensive evaluations show that reward-driven post-training consistently improves performance in both simulation and real-robot settings, demonstrating significant VLA model-reliability gains through simple yet effective reward models. Our findings highlight NORA-1.5 and reward-guided post-training as a viable path toward more dependable embodied agents suitable for real-world deployment.

Chia-Yu Hung, Navonil Majumder, Haoyuan Deng, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, Soujanya Poria• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv Google Robot tasks Visual Matching
Pick Coke Can Success Rate94
62
Robot ManipulationDiverse Manipulation Tasks Put S in S
PSR100
40
Multi-task LearningLIBERO
Object Score96.4
18
Robot ManipulationDiverse Manipulation Tasks Put U in U
PSR100
12
Robot ManipulationDiverse Manipulation Tasks Move U to U
PSR70
12
Robot ManipulationDiverse Manipulation Tasks Average
PSR83.13
4
Showing 6 of 6 rows

Other info

GitHub

Follow for update