Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

About

Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.

Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang• 2026

Related benchmarks

TaskDatasetResultRank
Synthetic Graph GenerationPlanar Dataset
Degree Statistic2.00e-4
27
Graph generationPlanar Graphs (test)
Unique Node %95
25
Synthetic Graph GenerationTree Dataset
Degree Similarity4.00e-4
11
Protein DockingZINC250k target: parp1 (test)
DS (top 5%)-12.515
9
Protein DockingZINC250k target: fa7 (test)
Docking Score (top 5%)-9.099
9
Protein DockingZINC250k target: 5ht1b (test)
DS (top 5%)-11.399
9
Protein DockingZINC250k target: jak2 (test)
DS (top 5%)-11.123
9
Target property optimizationPMO benchmark
Albuterol Similarity99.4
9
Protein DockingZINC250k target: braf (test)
DS (top 5%)-11.141
9
General Graph GenerationTree (test)
V.U.N.97.5
7
Showing 10 of 10 rows

Other info

Follow for update