Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Regularity as Intrinsic Reward for Free Play

About

We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.

Cansu Sancaktar, Justus Piater, Georg Martius• 2023

Related benchmarks

TaskDatasetResultRank
Multitower 2+2 AssemblyCONSTRUCTION
Success Rate77
6
Pyramid 5 AssemblyCONSTRUCTION
Success Rate49
6
Pyramid 6 AssemblyCONSTRUCTION
Success Rate18
6
Singletower 3 AssemblyCONSTRUCTION
Success Rate0.75
6
Pick&Place 6CONSTRUCTION
Success Rate90
6
Throw 4CONSTRUCTION
Success Rate0.32
6
Flip 4CONSTRUCTION
Success Rate65
6
Downstream task generalizationQuadruped RoboYoga (test)
Stand Leg Up89
2
Stack Cube + Ballcustom CONSTRUCTION zero-shot
Success Rate66
2
Stack Cube + Column + Ballcustom CONSTRUCTION zero-shot
Success Rate15
2
Showing 10 of 11 rows

Other info

Code

Follow for update