Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Horizon Reduction Makes RL Scalable

About

In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: https://github.com/seohongpark/horizon-reduction

Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
LocomotionOG-Bench humanoidmaze-medium-navigate-oraclerep v0
Success Rate98
10
LocomotionOG-Bench humanoidmaze-giant-navigate-oraclerep v0
Success Rate82
10
ManipulationOG-Bench puzzle-3x3-play-oraclerep v0
Success Rate1
10
Offline Goal-Conditioned Reinforcement Learningcube-octuple-1B
Success Rate3.40e+3
10
ManipulationOG-Bench cube-double-play-oraclerep v0
Success Rate95
10
ManipulationOG-Bench cube-octuple-play-oraclerep v0
Success Rate1.90e+3
10
ManipulationOG-Bench puzzle-4x5-play-oraclerep v0
Success Rate91
10
Offline Goal-Conditioned Reinforcement Learningcube-quadruple 100M
Success Rate64
10
Offline Goal-Conditioned Reinforcement Learningcube-triple 100M
Success Rate83
10
Offline Goal-Conditioned Reinforcement Learningpuzzle-4x6-1B
Success Rate6.40e+3
10
Showing 10 of 38 rows

Other info

Follow for update