NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

About

Vision--language--action (VLA) models have recently shown promising performance on a variety of embodied tasks, yet they still fall short in reliability and generalization, especially when deployed across different embodiments or real-world environments. In this work, we introduce NORA-1.5, a VLA model built from the pre-trained NORA backbone by adding to it a flow-matching-based action expert. This architectural enhancement alone yields substantial performance gains, enabling NORA-1.5 to outperform NORA and several state-of-the-art VLA models across both simulated and real-world benchmarks. To further improve robustness and task success, we develop a set of reward models for post-training VLA policies. Our rewards combine (i) an action-conditioned world model (WM) that evaluates whether generated actions lead toward the desired goal, and (ii) a deviation-from-ground-truth heuristic that distinguishes good actions from poor ones. Using these reward signals, we construct preference datasets and adapt NORA-1.5 to target embodiments through direct preference optimization (DPO). Extensive evaluations show that reward-driven post-training consistently improves performance in both simulation and real-robot settings, demonstrating significant VLA model-reliability gains through simple yet effective reward models. Our findings highlight NORA-1.5 and reward-guided post-training as a viable path toward more dependable embodied agents suitable for real-world deployment.

Chia-Yu Hung, Navonil Majumder, Haoyuan Deng, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, Soujanya Poria• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement96.4	1025
Robot Manipulation	LIBERO	Spatial Success Rate97.3	223
Robot Manipulation	SimplerEnv Google Robot tasks Visual Matching	Pick Coke Can Success Rate94	70
Robot Manipulation	Diverse Manipulation Tasks Put S in S	PSR100	40
Multi-task Learning	LIBERO	Object Score96.4	25
Robot Manipulation	Diverse Manipulation Tasks Put U in U	PSR100	12
Robot Manipulation	Diverse Manipulation Tasks Move U to U	PSR70	12
Robot Manipulation	Diverse Manipulation Tasks Average	PSR83.13	4

Showing 8 of 8 rows

Other info

GitHub

Follow for update

@wizwand_team Discord