Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Flow Q-Learning

About

We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL. Project page: https://seohong.me/projects/fql/

Seohong Park, Qiyang Li, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
hopper locomotionD4RL hopper medium-replay
Normalized Score85.4
56
Offline Reinforcement LearningOGBench antmaze-large-navigate-singletask task1-v0 to task5-v0
Score93
55
walker2d locomotionD4RL walker2d medium-replay
Normalized Score82.1
53
LocomotionD4RL walker2d-medium-expert
Normalized Score100.5
47
LocomotionD4RL Walker2d medium
Normalized Score72.7
44
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score89
40
Offline Reinforcement LearningD4RL MuJoCo Hopper medium standard
Normalized Score68.1
36
LocomotionD4RL HalfCheetah Medium-Replay
Normalized Score0.511
33
Offline Reinforcement LearningD4RL Adroit pen (cloned)
Normalized Return74
32
Offline Reinforcement LearningD4RL Adroit pen (human)
Normalized Return53
32
Showing 10 of 76 rows
...

Other info

Follow for update