Regularity as Intrinsic Reward for Free Play

About

We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.

Cansu Sancaktar, Justus Piater, Georg Martius• 2023

Related benchmarks

Task	Dataset	Result
Multitower 2+2 Assembly	CONSTRUCTION	Success Rate77	6
Pyramid 5 Assembly	CONSTRUCTION	Success Rate49	6
Pyramid 6 Assembly	CONSTRUCTION	Success Rate18	6
Singletower 3 Assembly	CONSTRUCTION	Success Rate0.75	6
Pick&Place 6	CONSTRUCTION	Success Rate90	6
Throw 4	CONSTRUCTION	Success Rate0.32	6
Flip 4	CONSTRUCTION	Success Rate65	6
Downstream task generalization	Quadruped RoboYoga (test)	Stand Leg Up89	2
Stack Cube + Ball	custom CONSTRUCTION zero-shot	Success Rate66	2
Stack Cube + Column + Ball	custom CONSTRUCTION zero-shot	Success Rate15	2

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord