Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Towards General-Purpose Model-Free Reinforcement Learning

About

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat• 2025

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant v4
Average Return6.90e+3
46
Continuous ControlMuJoCo Walker2d v4--
39
Continuous ControlMuJoCo HalfCheetah v4
Average Return1.29e+4
36
LocomotionDog & Humanoid suite
IQM0.796
32
Continuous ControlGym MuJoCo Humanoid v4
Average Return1.02e+4
15
Continuous ControlGym MuJoCo Suite Aggregate
IQM1.499
15
Continuous ControlGym MuJoCo Hopper v4
Average Return2.69e+3
15
Continuous ControlDeepMind Control (DMC) Suite (1M steps)
IQM83
14
Continuous ControlDMC Suite Hard v1 (test)
Dog Run Return569
12
Continuous ControlHumanoidBench (w/ Hand)
Return (Slide)146
12
Showing 10 of 20 rows

Other info

Follow for update