Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

About

Offline reinforcement learning often relies on behavior regularization that enforces policies to remain close to the dataset distribution. However, such approaches fail to distinguish between high-value and low-value actions in their regularization components. We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled one-step actor. The actor directs the flow policy through weighted behavior cloning to focus on cloning high-value actions from the dataset rather than indiscriminately imitating all state-action pairs. In turn, the flow policy constrains the actor to remain aligned with the dataset's best transitions while maximizing the critic. This mutual guidance enables GFP to achieve state-of-the-art performance across 144 state and pixel-based tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and challenging tasks. Webpage: https://simple-robotics.github.io/publications/guided-flow-policy/

Franki Nguimatsia Tiofack, Th\'eotime Le Hellard, Fabian Schramm, Nicolas Perrin-Gilbert, Justin Carpentier• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningOGBench antmaze-large-navigate-singletask task1-v0 to task5-v0
Score95.6
55
Offline Reinforcement LearningOGBench antmaze-giant-navigate-singletask task1-v0 to task5-v0
Score52.2
22
Offline Reinforcement LearningOGBench
Overall Score53.2
21
Offline Reinforcement LearningD4RL (various)--
16
Offline Reinforcement LearningOGBench Average (50 tasks)
Score51.8
11
Offline Reinforcement LearningOGBench humanoidmaze-medium-navigate-singletask task1-v0 to task5-v0
Score95.8
7
Offline Reinforcement LearningMinari 21 tasks
Minari Adroit Score48.3
3
Showing 7 of 7 rows

Other info

Follow for update