Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeMPO: Generalized Measure Matching for Online Diffusion Reinforcement Learning

About

A commonly used family of RL algorithms for diffusion policies conducts softmax reweighting over samples from the behavior policy, which often induces an overgreedy policy and fails to utilize feedback from negative samples. In this work, we introduce GeMPO, a simple and unified framework that generalizes reweighting scheme in diffusion RL from softmax to general monotonic functions. GeMPO revisits diffusion RL via a measure matching perspective: First, we construct a virtual target policy measure via solving a regularized policy optimization objective; Second, we minimize the divergence between the current policy and this target measure through reweighted flow matching. This formulation offers two key advantages: i) It extends weight design beyond traditional exponential reweighting, allowing it to be tailored to diverse reward landscapes; and ii) by relaxing the non-negativity constraint on the target measure, our framework provides a principled justification for negative reweighting. We provide interpretations of how negative reweighting actively repels the policy from suboptimal actions and thus facilitates exploration. Extensive empirical evaluations demonstrate that GeMPO achieves competitive or superior performance by leveraging these flexible weighting schemes, and we provide practical guidelines for selecting reweighting methods in practice.

Haitong Ma, Chenxiao Gao, Tianyi Chen, Na Li, Bo Dai• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningMuJoCo Half-Cheetah
Average Return1.39e+4
28
Reinforcement LearningMuJoCo Hopper
Average Return3.00e+3
24
Reinforcement LearningMuJoCo Ant
Average Return5.98e+3
24
Reinforcement LearningSwimmer
Average Returns69
24
DNA Sequence GenerationPred-Activity
Pred-Activity7.62
13
Reinforcement LearningMuJoCo Humanoid
Average Return5.47e+3
12
Reinforcement LearningGym-MuJoCo Walker2D
Average Return4.91e+3
10
Showing 7 of 7 rows

Other info

Follow for update