DAM-VLA: A Dynamic Action Model-Based Vision-Language-Action Framework for Robot Manipulation
About
In dynamic environments such as warehouses, hospitals, and homes, robots must seamlessly transition between gross motion and precise manipulations to complete complex tasks. However, current Vision-Language-Action (VLA) frameworks, largely adapted from pre-trained Vision-Language Models (VLMs), often struggle to reconcile general task adaptability with the specialized precision required for intricate manipulation. To address this challenge, we propose DAM-VLA, a dynamic action model-based VLA framework. DAM-VLA integrates VLM reasoning with diffusion-based action models specialized for arm and gripper control. Specifically, it introduces (i) an action routing mechanism, using task-specific visual and linguistic cues to select appropriate action models (e.g., arm movement or gripper manipulation), (ii) a dynamic action model that fuses high-level VLM cognition with low-level visual features to predict actions, and (iii) a dual-scale action weighting mechanism that enables dynamic coordination between the arm-movement and gripper-manipulation models. Across extensive evaluations, DAM-VLA achieves superior success rates compared to state-of-the-art VLA methods in simulated (SIMPLER, FurnitureBench) and real-world settings, showing robust generalization from standard pick-and-place to demanding long-horizon and contact-rich tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | SIMPLER Visual Matching WidowX robot | Put Spoon on Towel Score88 | 51 | |
| Robotic Manipulation | SIMPLER Google Robot VA | Pick Up Coke Can Success Rate98 | 35 | |
| Robot Manipulation | SIMPLER Google robot, Visual Matching setting (test) | Success Rate (PCC)96 | 10 | |
| Pick-&-Place | Real-world Robot Pick-and-place Average | Success Rate86.8 | 5 | |
| One-Leg assembly | FurnitureBench | Step 1 Success Rate100 | 3 | |
| Pick-&-Place | Real-world In-Distribution | Success Rate91.4 | 2 | |
| Pick-&-Place | Real-world Out-of-Distribution | Success Rate82.2 | 2 |