Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

About

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.

Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim• 2023

Related benchmarks

TaskDatasetResultRank
Reward ModelingEVAL_INSTRUCT 3 steps
Step Completion Rate1.9
4
Reward ModelingEVAL_INSTRUCT 4 steps
Step Completion Rate2.25
4
Reward ModelingEVAL_INSTRUCT 5 steps
Step Completion Rate3.31
4
Reward ModelingEVAL_INSTRUCT (overall)
Step Completion Rate2.2
4
Reward ModelingEVAL_INSTRUCT 2 steps
Step Completion Rate1.35
4
Showing 5 of 5 rows

Other info

Follow for update