DMAP: a Distributed Morphological Attention Policy for Learning to Locomote with a Changing Body

About

Biological and artificial agents need to deal with constant changes in the real world. We study this problem in four classical continuous control environments, augmented with morphological perturbations. Learning to locomote when the length and the thickness of different body parts vary is challenging, as the control policy is required to adapt to the morphology to successfully balance and advance the agent. We show that a control policy based on the proprioceptive state performs poorly with highly variable body configurations, while an (oracle) agent with access to a learned encoding of the perturbation performs significantly better. We introduce DMAP, a biologically-inspired, attention-based policy network architecture. DMAP combines independent proprioceptive processing, a distributed policy with individual controllers for each joint, and an attention mechanism, to dynamically gate sensory information from different body parts to different controllers. Despite not having access to the (hidden) morphology information, DMAP can be trained end-to-end in all the considered environments, overall matching or surpassing the performance of an oracle agent. Thus DMAP, implementing principles from biological motor control, provides a strong inductive bias for learning challenging sensorimotor tasks. Overall, our work corroborates the power of these principles in challenging locomotion tasks.

Alberto Silvio Chiappa, Alessandro Marin Vargas, Alexander Mathis• 2022

Related benchmarks

Task	Dataset	Result
Locomotion	Ant IID (test)	Mean Episode Reward2.24e+3	24
Locomotion Control	Ant sigma 0.1 (test)	Episode Reward2.24e+3	24
Locomotion Control	Half Cheetah sigma 0.3 (test)	Episode Reward1.58e+3	24
Locomotion	Half Cheetah IID (test)	Mean Episode Reward2.26e+3	24
Locomotion	Hopper IID (test)	Mean Episode Reward1.84e+3	24
Locomotion Control	Ant sigma 0.5 (test)	Episode Reward960	24
Locomotion Control	Hopper sigma 0.3 (test)	Episode Reward1.32e+3	24
Locomotion	Walker IID (test)	Mean Episode Reward1.23e+3	24
Locomotion Control	Walker sigma 0.1 (test)	Episode Reward1.23e+3	24
Locomotion Control	Ant sigma 0.3 (test)	Episode Reward1.62e+3	24

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord