DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction
About
We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | OGBench antmaze-giant-navigate-singletask task1-v0 to task5-v0 | Score12 | 33 | |
| Offline Reinforcement Learning | D4RL antmaze 6 tasks | Normalized Score83 | 21 | |
| Offline Reinforcement Learning | D4RL adroit (12 tasks) | Normalized Score52 | 21 | |
| Offline Reinforcement Learning | OGBench cube-single-singletask 5 tasks | Normalized Score96 | 14 | |
| Object Manipulation | OGBench cube-double-singletask | Score38 | 12 | |
| soccer | OGBench antsoccer-arena-singletask | Score62 | 12 | |
| Navigation | OGBench antmaze-large-singletask | Score81 | 12 | |
| Navigation | OGBench humanoidmaze-large-singletask | Score5 | 12 | |
| Navigation | D4RL AntMaze | Score81 | 12 | |
| Object Manipulation | OGBench scene-singletask | Score51 | 12 |