ACG: Action Coherence Guidance for Flow-based Vision-Language-Action models

About

Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.

Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo• 2025

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	RoboCasa	--	39
Robot Manipulation	DexMG	Success Rate44	8
Robot Manipulation	Three Strawberries SO-101	Success Rate74.4	8
Robot Manipulation	Tic-Tac-Toe SO-101	Success Rate56.7	8
Robot Manipulation	Average Across Simulation and Real-world	Success Rate53.6	8
Robotics manipulation	RoboMimic	Success Rate (Lift)99	4
Robotics manipulation	LIBERO	Success Rate (Open-Drawer)97	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord