Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning with Action Chunking

About

We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.

Qiyang Li, Zhiyuan Zhou, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationRobomimic Can
Success Rate94
16
Robotic ManipulationRobomimic Lift
Success Rate100
14
Robot goal-reaching success rate evaluationOGBench scene-play-sparse-singletask
Success Rate87
13
Robot goal-reaching success rate evaluationOGBench cube-single-play-singletask
Success Rate98
13
Robot goal-reaching success rate evaluationOGBench cube-double-play-singletask
Success Rate (%)39
13
Robot goal-reaching success rate evaluationOGBench puzzle-4x4-play-sparse-singletask
Success Rate22
13
Robot goal-reaching success rate evaluationOGBench puzzle-3x3-play-sparse-singletask
Success Rate39
13
Robotic ManipulationRobomimic Square
Success Rate92
12
Offline Reinforcement LearningD4RL
Walker2d (Medium Expert) Score102.8
11
Offline Goal-Conditioned Reinforcement Learninghumanoidmaze giant
Success Rate4.80e+3
10
Showing 10 of 34 rows

Other info

Follow for update