Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DMAP: a Distributed Morphological Attention Policy for Learning to Locomote with a Changing Body

About

Biological and artificial agents need to deal with constant changes in the real world. We study this problem in four classical continuous control environments, augmented with morphological perturbations. Learning to locomote when the length and the thickness of different body parts vary is challenging, as the control policy is required to adapt to the morphology to successfully balance and advance the agent. We show that a control policy based on the proprioceptive state performs poorly with highly variable body configurations, while an (oracle) agent with access to a learned encoding of the perturbation performs significantly better. We introduce DMAP, a biologically-inspired, attention-based policy network architecture. DMAP combines independent proprioceptive processing, a distributed policy with individual controllers for each joint, and an attention mechanism, to dynamically gate sensory information from different body parts to different controllers. Despite not having access to the (hidden) morphology information, DMAP can be trained end-to-end in all the considered environments, overall matching or surpassing the performance of an oracle agent. Thus DMAP, implementing principles from biological motor control, provides a strong inductive bias for learning challenging sensorimotor tasks. Overall, our work corroborates the power of these principles in challenging locomotion tasks.

Alberto Silvio Chiappa, Alessandro Marin Vargas, Alexander Mathis• 2022

Related benchmarks

TaskDatasetResultRank
LocomotionAnt IID (test)
Mean Episode Reward2.24e+3
24
Locomotion ControlAnt sigma 0.1 (test)
Episode Reward2.24e+3
24
Locomotion ControlHalf Cheetah sigma 0.3 (test)
Episode Reward1.58e+3
24
LocomotionHalf Cheetah IID (test)
Mean Episode Reward2.26e+3
24
LocomotionHopper IID (test)
Mean Episode Reward1.84e+3
24
Locomotion ControlAnt sigma 0.5 (test)
Episode Reward960
24
Locomotion ControlHopper sigma 0.3 (test)
Episode Reward1.32e+3
24
LocomotionWalker IID (test)
Mean Episode Reward1.23e+3
24
Locomotion ControlWalker sigma 0.1 (test)
Episode Reward1.23e+3
24
Locomotion ControlAnt sigma 0.3 (test)
Episode Reward1.62e+3
24
Showing 10 of 30 rows

Other info

Follow for update