Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models

About

Vision-Language Models (VLMs) show promise as zero-shot goal-conditioned value functions, but their frozen pre-trained representations limit generalization and temporal reasoning. We introduce VITA, a zero-shot value function learning method that enhances both capabilities via test-time adaptation. At inference, a lightweight adaptation module is updated via a gradient step on a meta-learned self-supervised loss, such that each test-time update improves value estimation. By updating sequentially over a trajectory, VITA encodes history into its parameters, addressing the temporal reasoning limitations. To mitigate shortcut learning, we propose a dissimilarity-based sampling strategy that selects semantically diverse segments of the trajectory during training. In real-world robotic manipulation tasks, VITA generalizes from a single training environment to diverse out-of-distribution tasks, environments, and embodiments, outperforming the state-of-the-art zero-shot method using autoregressive VLMs. Furthermore, we demonstrate that VITA's zero-shot value estimates can be utilized for reward shaping in offline reinforcement learning, resulting in multi-task policies on the Meta-World benchmark that exceed the performance of those trained with the simulation's fuzzy-logic dense rewards. Project website: https://chziakas.github.io/vita/.

Christos Ziakas, Alessandra Russo• 2025

Related benchmarks

TaskDatasetResultRank
Value function estimationBridgeData lm_pnp Environment Shift V2
VOC72.5
7
Value function estimationBridgeData Environment Shift V2 (td_fold)
VOC70.9
7
Value function estimationBridgeData ms_sweep Environment Shift V2
VOC0.49
7
Value function estimationBridgeData dt_ft_stack ES & EM V2
VOC69.8
7
Value function estimationBridgeData Environment Shift V2 (ft_fold)
VOC65.8
7
Value function estimationBridgeData Environment Shift V2 (rd_fold)
VOC60.6
7
Value function estimationBridgeData Embodiment Shift dt_tk_pnp V2
VOC0.82
7
Value function estimationBridgeData dt_rd_pnp ES & EM V2
VOC Score69.5
7
Expert vs. Non-Expert Trajectory DiscriminationBridgeData 5 scripted datasets V2 (in-distribution)
BinVOC1
7
Value function estimationBridgeData tk_pnp In-Distribution V2
VOC0.782
7
Showing 10 of 12 rows

Other info

Follow for update