Flow Q-Learning

About

We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL. Project page: https://seohong.me/projects/fql/

Seohong Park, Qiyang Li, Sergey Levine• 2025

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score106.1	169
Offline Reinforcement Learning	D4RL Medium-Replay Hopper	Normalized Score64.8	109
Locomotion	D4RL walker2d-medium-expert	Normalized Score100.5	90
Offline Reinforcement Learning	D4RL Walker2d Medium v2	Normalized Return9.5	85
walker2d locomotion	D4RL walker2d medium-replay	Normalized Score82.1	78
Offline Reinforcement Learning	D4RL antmaze-umaze (diverse)	Normalized Score89	74
Offline Reinforcement Learning	D4RL Gym walker2d (medium-replay)	Normalized Return38.8	73
hopper locomotion	D4RL hopper medium-replay	Normalized Score85.4	71
Locomotion	D4RL Walker2d medium	Normalized Score72.7	70
Locomotion	D4RL HalfCheetah Medium-Replay	Normalized Score0.511	68

Showing 10 of 246 rows

...

Other info

Follow for update

@wizwand_team Discord