Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reinforcement Learning with Action Chunking

About

We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.

Qiyang Li, Zhiyuan Zhou, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationRobomimic Can
Success Rate94
12
Robotic ManipulationRobomimic Lift
Success Rate100
12
Robotic ManipulationRobomimic Square
Success Rate92
12
Offline Goal-Conditioned Reinforcement Learninghumanoidmaze giant
Success Rate4.80e+3
10
Offline Goal-Conditioned Reinforcement Learningpuzzle 4x5
Success Rate2.00e+3
10
Offline Goal-Conditioned Reinforcement Learningpuzzle-4x6-1B
Success Rate2.80e+3
10
Offline Goal-Conditioned Reinforcement Learningcube-quadruple 100M
Success Rate35
10
Offline Goal-Conditioned Reinforcement Learningcube-triple 100M
Success Rate20
10
Offline Goal-Conditioned Reinforcement Learningcube-octuple-1B
Success Rate0.00e+0
10
Language-guided robot manipulationLIBERO-Spatial 5-shot (test)
Success Rate46
5
Showing 10 of 21 rows

Other info

Follow for update