DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

About

We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.

Zhancun Mu• 2026

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	OGBench antmaze-giant-navigate-singletask task1-v0 to task5-v0	Score12	33
Offline Reinforcement Learning	D4RL antmaze 6 tasks	Normalized Score83	21
Offline Reinforcement Learning	D4RL adroit (12 tasks)	Normalized Score52	21
Offline Reinforcement Learning	OGBench cube-single-singletask 5 tasks	Normalized Score96	14
Object Manipulation	OGBench cube-double-singletask	Score38	12
soccer	OGBench antsoccer-arena-singletask	Score62	12
Navigation	OGBench antmaze-large-singletask	Score81	12
Navigation	OGBench humanoidmaze-large-singletask	Score5	12
Navigation	D4RL AntMaze	Score81	12
Object Manipulation	OGBench scene-singletask	Score51	12

Showing 10 of 39 rows

Other info

Follow for update

@wizwand_team Discord