Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

About

We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.

Zhancun Mu• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningOGBench antmaze-giant-navigate-singletask task1-v0 to task5-v0
Score12
33
Offline Reinforcement LearningD4RL antmaze 6 tasks
Normalized Score83
21
Offline Reinforcement LearningD4RL adroit (12 tasks)
Normalized Score52
21
Offline Reinforcement LearningOGBench cube-single-singletask 5 tasks
Normalized Score96
14
Object ManipulationOGBench cube-double-singletask
Score38
12
soccerOGBench antsoccer-arena-singletask
Score62
12
NavigationOGBench antmaze-large-singletask
Score81
12
NavigationOGBench humanoidmaze-large-singletask
Score5
12
NavigationD4RL AntMaze
Score81
12
Object ManipulationOGBench scene-singletask
Score51
12
Showing 10 of 39 rows

Other info

Follow for update