Smooth Exploration for Robotic Reinforcement Learning

About

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.

Antonin Raffin, Jens Kober, Freek Stulp• 2020

Related benchmarks

Task	Dataset	Result
Locomotion	PyBullet Walker	Energy Consumption0.25	8
Locomotion	PyBullet Humanoid	Energy Consumption0.11	8
Locomotion	PyBullet Ant	Energy Consumption0.23	8
Locomotion	PyBullet Hopper	Energy0.23	8
Locomotion	PyBullet Half cheetah	Energy Consumption0.23	8
Elbow Pose	MyoSuite	Energy0.18	4
Finger Pose	MyoSuite	Energy0.02	4
Baoding	MyoSuite (test)	Energy0.07	4
Hand reach	MyoSuite (test)	Energy0.07	4
Finger reach	MyoSuite Finger reach (N=5 seeds)	Energy0.07	4

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord