Body Transformer: Leveraging Robot Embodiment for Policy Learning

About

In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. Therefore, we propose Body Transformer (BoT), an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies. Additional material including the open-source code is available at https://sferrazza.cc/bot_site.

Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel• 2024

Related benchmarks

Task	Dataset	Result
Mass Generalization	Go2 1.5–2.0× mass	Retention Rate69.7	6
Reinforcement Learning	Genesis T1 + G1 + Go1 + Go2	IQM0.69	6
Reinforcement Learning	SAPIEN Humanoid + Hopper	IQM0.66	6
Mass Generalization	MuJoCo Humanoid 1.5–2.0× mass	Retention Rate17	6
Autonomous Driving	CARLA Vehicles (27 vehicles) (in-distribution)	Average Driving Score (DS)36.92	5
Mass Generalization	T1 1.1–1.5× mass	Retention Rate57	4
Autonomous Driving	CARLA Average over 31 vehicles (out-of-distribution)	Average Driving Score (DS)23.21	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord