Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Energy-Weighted Flow Matching for Offline Reinforcement Learning

About

This paper investigates energy guidance in generative modeling, where the target distribution is defined as $q(\mathbf x) \propto p(\mathbf x)\exp(-\beta \mathcal E(\mathbf x))$, with $p(\mathbf x)$ being the data distribution and $\mathcal E(\mathcal x)$ as the energy function. To comply with energy guidance, existing methods often require auxiliary procedures to learn intermediate guidance during the diffusion process. To overcome this limitation, we explore energy-guided flow matching, a generalized form of the diffusion process. We introduce energy-weighted flow matching (EFM), a method that directly learns the energy-guided flow without the need for auxiliary models. Theoretical analysis shows that energy-weighted flow matching accurately captures the guided flow. Additionally, we extend this methodology to energy-weighted diffusion models and apply it to offline reinforcement learning (RL) by proposing the Q-weighted Iterative Policy Optimization (QIPO). Empirically, we demonstrate that the proposed QIPO algorithm improves performance in offline RL tasks. Notably, our algorithm is the first energy-guided diffusion model that operates independently of auxiliary models and the first exact energy-guided flow matching model in the literature.

Shiyuan Zhang, Weitong Zhang, Quanquan Gu• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score76.1
74
Offline Reinforcement LearningD4RL Gym walker2d (medium-replay)
Normalized Return90.1
73
Offline Reinforcement LearningD4RL Gym halfcheetah-medium
Normalized Return54.2
65
Offline Reinforcement LearningD4RL Gym walker2d medium
Normalized Return87.6
63
Offline Reinforcement LearningD4RL Gym hopper (medium-replay)
Normalized Return101.2
49
Offline Reinforcement LearningD4RL Gym halfcheetah-medium-replay
Normalized Average Return48
48
Offline Reinforcement LearningD4RL antmaze-large (diverse)
Normalized Score32.1
47
Offline Reinforcement LearningD4RL Gym hopper-medium
Normalized Return94
46
Offline Reinforcement LearningD4RL MuJoCo halfcheetah-medium-expert
Normalized Score94.5
43
Offline Reinforcement LearningD4RL Gym walker2d medium-expert
Normalized Average Return110.9
43
Showing 10 of 60 rows

Other info

Follow for update