Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

About

Long-horizon, sparse-reward tasks pose a fundamental challenge for reinforcement learning, since single-step TD learning suffers from bootstrapping error accumulation across successive Bellman updates. Actor-critic methods with action chunking address this by operating over temporally extended actions, which reduce the effective horizon, enable fast value backups, and support temporally consistent exploration. However, existing methods rely on a fixed chunk size and therefore cannot adaptively balance reactivity against temporal consistency. A large fixed chunk size reduces responsiveness to new observations, while a small one produces incoherent motions, forcing task-specific tuning of the chunk size. To address this limitation, we propose Adaptive Chunk Size Actor-Critic (ACSAC). ACSAC leverages a causal Transformer critic to evaluate expected returns for action chunks of different sizes. At each chunk boundary, it adaptively selects the chunk size that maximizes the expected return, supporting flexible, state-dependent chunk sizes without task-specific tuning. We prove that the ACSAC Bellman operator is a contraction whose unique fixed point is the action-value function of the adaptive policy. Experiments on OGBench demonstrate that ACSAC achieves state-of-the-art performance on long-horizon, sparse-reward manipulation tasks across both offline RL and offline-to-online RL settings.

Qian Chen, Junqiao Zhao, Hongtu Zhou, Hang Yu, Yanping Zhao, Chen Ye, Guang Chen• 2026

Related benchmarks

TaskDatasetResultRank
ManipulationOGBench Manipulation
Scene-Sparse Success Rate (5 Tasks)98
13
Robotic ManipulationOGBench scene-sparse
Offline Performance99
10
Robotic ManipulationOGBench cube-double
Offline Performance Score84
10
Robotic ManipulationOGBench cube-triple
Offline Performance19
10
Robotic ManipulationOGBench overall
Offline Performance Score61
10
Robotic ManipulationOGBench puzzle-3x3-sparse
Offline Success Rate100
10
Robotic ManipulationOGBench cube-quadruple
Offline Performance5
10
Showing 7 of 7 rows

Other info

Follow for update