TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models

About

Many robotic manipulation tasks require sensing and responding to force signals such as torque to assess whether the task has been successfully completed and to enable closed-loop control. However, current Vision-Language-Action (VLA) models lack the ability to integrate such subtle physical feedback. In this work, we explore Torque-aware VLA models, aiming to bridge this gap by systematically studying the design space for incorporating torque signals into existing VLA architectures. We identify and evaluate several strategies, leading to three key findings. First, introducing torque adapters into the decoder consistently outperforms inserting them into the encoder.Third, inspired by joint prediction and planning paradigms in autonomous driving, we propose predicting torque as an auxiliary output, which further improves performance. This strategy encourages the model to build a physically grounded internal representation of interaction dynamics. Extensive quantitative and qualitative experiments across contact-rich manipulation benchmarks validate our findings.

Zongzheng Zhang, Haobo Xu, Zhuo Yang, Chenghao Yue, Zehao Lin, Huan-ang Gao, Ziwei Wang, Hao Zhao• 2025

Related benchmarks

Task	Dataset	Result
Contact-rich manipulation	Consolidated real-world manipulation dataset (eval)	Unstack Cup Success Rate29.2	14
Charger Plugging	Charger Plugging	Success Rate (SR)80	11
Cucumber Peeling	Real-world visuo-tactile dataset	Success Rate17	10
Robotic Manipulation	Dataset B force condition 1.0 (Firm)	Success Rate (SR)82	9
Robotic Manipulation	Dataset B Gentle force condition 1.0	Success Rate (SR)31	9
Flip	Robotic Manipulation Generalization Evaluation (test)	Success Rate62.5	7
Push and Flip	Push and Flip (Unseen Object 5)	Success Rate3	7
Push and Flip	Push and Flip (Unseen Object 1)	Success Rate60	7
Push and Flip	Push and Flip (Unseen Object 2)	Success Rate0.4	7
Push and Flip	Push and Flip Unseen Object 3	Success Rate20	7

Showing 10 of 28 rows

Other info

Follow for update

@wizwand_team Discord