Towards General-Purpose Model-Free Reinforcement Learning

About

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat• 2025

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Ant v4	Average Return6.90e+3	46
Continuous Control	MuJoCo Walker2d v4	--	39
Continuous Control	MuJoCo HalfCheetah v4	Average Return1.29e+4	36
Locomotion	Dog & Humanoid suite	IQM0.796	32
Continuous Control	Gym MuJoCo Humanoid v4	Average Return1.02e+4	15
Continuous Control	Gym MuJoCo Suite Aggregate	IQM1.499	15
Continuous Control	Gym MuJoCo Hopper v4	Average Return2.69e+3	15
Continuous Control	DeepMind Control (DMC) Suite (1M steps)	IQM83	14
Continuous Control	DMC Suite Hard v1 (test)	Dog Run Return569	12
Continuous Control	HumanoidBench (w/ Hand)	Return (Slide)146	12

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord