Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval

About

Vision-Language-Action (VLA) models like OpenVLA demonstrate impressive zero-shot generalization across robotic manipulation tasks but struggle to adapt to specific deployment environments where consistent high performance on a limited set of tasks is more valuable than broad generalization. We present EXPierence replayed, REtrieval augmented, Specialized VLA (ExpReS-VLA), a method that enables rapid on-device adaptation of pre-trained VLAs to target domains while preventing catastrophic forgetting through compressed experience replay and retrieval-augmented generation. Our approach maintains a memory-efficient buffer by storing extracted embeddings from OpenVLA's frozen vision backbone, reducing storage requirements by 97% compared to raw image-action pairs. During deployment, ExpReS-VLA retrieves the $k$ most similar past experiences using cosine similarity to augment training batches, while a prioritized experience replay buffer preserves recently successful trajectories. To leverage failed attempts, we introduce Thresholded Hybrid Contrastive Loss (THCL), enabling the model to learn from both successful and unsuccessful demonstrations. Experiments on the LIBERO benchmark show improvements from 82.6% to 93.1% on spatial reasoning and 61% to 72.3% on long-horizon tasks over base OpenVLA, with gains across architectures including $\pi_0$ (+3.2 points) and OpenVLA-OFT (+1.7 points). Physical robot experiments across five tasks demonstrate 98% success on both in-distribution and out-of-distribution conditions, improving from 84.7% and 32% respectively for naive fine-tuning. Adaptation completes in 31 seconds using 12 demonstrations on a single RTX 5090.

Shahram Najam Syed, Yatharth Ahuja, Arthur Jakobsson, Jeff Ichnowski• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement98.1
700
Robot ManipulationPhysical Robot In-Distribution Tasks
Place White Mug in Bowl Success Rate0.9667
4
Robot ManipulationPhysical Robot Out-of-Distribution Tasks
Success Rate: Place Mug (New BG)90
4
Showing 3 of 3 rows

Other info

Follow for update