Reinforcement Learning with Action Chunking

About

We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.

Qiyang Li, Zhiyuan Zhou, Sergey Levine• 2025

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	Robomimic Can	Success Rate94	30
Robotic Manipulation	Robomimic Lift	Success Rate100	28
Robotic Manipulation	Robomimic Square	Success Rate92	26
Robot Manipulation	RoboCasa-GR1	Average Success Rate55.5	20
Navigation	OGBench humanoidmaze-medium-navigate	Success Rate (Offline)59	15
Robotic Manipulation	OGBench Cube-double-task2	Success Rate100	15
Robotic Manipulation	OGBench cube-double online	Success Rate100	14
Robotic Manipulation	OGBench cube-triple online	Success Rate64	14
Robotic Manipulation	OGBench cube-quadruple online	Success Rate77	14
Robotic Manipulation	OGBench overall 25 tasks online	Success Rate86	14

Showing 10 of 98 rows

...

Other info

Follow for update

@wizwand_team Discord