Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scalable Offline Model-Based RL with Action Chunks

About

In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion, can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL. Model-based value expansion fits an on-policy value function using length-n imaginary rollouts generated by the current policy and a learned dynamics model. While larger n reduces bias in value bootstrapping, it amplifies accumulated model errors over long horizons, degrading future predictions. We address this trade-off with an \emph{action-chunk} model that predicts a future state from a sequence of actions (an "action chunk") instead of a single action, which reduces compounding errors. In addition, instead of directly training a policy to maximize rewards, we employ rejection sampling from an expressive behavioral action-chunk policy, which prevents model exploitation from out-of-distribution actions. We call this recipe \textbf{Model-Based RL with Action Chunks (MAC)}. Through experiments on highly challenging tasks with large-scale datasets of up to 100M transitions, we show that MAC achieves the best performance among offline model-based RL algorithms, especially on challenging long-horizon tasks.

Kwanyoung Park, Seohong Park, Youngwoon Lee, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learningpuzzle-4x4-play OGBench 5 tasks v0
Average Success Rate78
18
ManipulationOG-Bench cube-double-play-oraclerep v0
Success Rate100
10
ManipulationOG-Bench cube-octuple-play-oraclerep v0
Success Rate3.00e+3
10
ManipulationOG-Bench puzzle-4x5-play-oraclerep v0
Success Rate99
10
ManipulationOG-Bench puzzle-3x3-play-oraclerep v0
Success Rate1
10
LocomotionOG-Bench humanoidmaze-medium-navigate-oraclerep v0
Success Rate36
10
LocomotionOG-Bench humanoidmaze-giant-navigate-oraclerep v0
Success Rate0.00e+0
10
Offline Reinforcement LearningOGBench cube-single-play 5 tasks v0
Average Success Rate0.99
9
Offline Reinforcement Learningcube-double-play OGBench 5 tasks v0
Average Success Rate53
9
Offline Reinforcement Learningscene-play OGBench 5 tasks v0
Average Success Rate97
9
Showing 10 of 11 rows

Other info

Follow for update