V-VLAPS: Value-Guided Planning for Vision-Language-Action Models

About

Vision-language-action (VLA) models provide strong action priors for robotic manipulation, but their reactive behavior can fail under distribution shift and long-horizon task structure. Recent VLA-guided planning methods improve execution by using pretrained policies to guide tree search, yet node selection still depends heavily on policy priors and visit-count exploration. Consequently, when the policy favors poor actions, the planner lacks a learned value signal to correct this bias. Prior work has shown that VLA representations encode rollout success and failure information, suggesting that they may also support value estimation during planning. We introduce Value-Guided Vision-Language-Action Planning and Search (V-VLAPS), which augments VLA-guided planning with a lightweight value head trained on offline VLA rollouts to predict Monte Carlo returns. These predictions guide Monte Carlo Tree Search toward higher-value branches. Across five LIBERO suites, V-VLAPS matches value-free planning baseline at the default search budget in aggregate, and analysis shows that many hard failures are root-level timeouts where predicted values are weakly separated. With a larger search budget, V-VLAPS improves over the baseline in all task suites with +6 percentage points on LIBERO-Object and +4 percentage points on LIBERO-10. Our results suggest that VLA representations can support not only failure prediction, but also value-guided planning when search reaches branches where value-based ranking matters.

Ke Ren, Ali Salamatian, Kieran Pattison, Cyrus Neary• 2026

Related benchmarks

Task	Dataset	Result
Planning	LIBERO Spatial Suite	Average MCTS Simulation28.8	33
Planning	LIBERO Object Suite	Average MCTS Simulations31.79	33
Robot Manipulation	LIBERO Object suite (test)	Task 0 Success Rate100	4
Robot Manipulation	LIBERO Spatial suite (test)	Task 0 Success Rate100	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord