Sample-efficient Cross-Entropy Method for Real-time Planning

About

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold, Joerg Stueckler, Michal Rolinek, Georg Martius• 2020

Related benchmarks

Task	Dataset	Result
Robotic Grasping	Simulation wide friction regime	Success Rate (%)79	5
Robotic Grasping	Simulation friction regime (nominal)	Success Rate (%)100	5
Robotic Grasping	Simulation bimodal friction regime	Success Rate (SR)42	5

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord