Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

About

Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.

Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang• 2026

Related benchmarks

Task	Dataset	Result
Synthetic Graph Generation	Planar Dataset	Degree Statistic2.00e-4	27
Graph generation	Planar Graphs (test)	Unique Node %95	25
Synthetic Graph Generation	Tree Dataset	Degree Similarity4.00e-4	21
Protein Docking	ZINC250k target: parp1 (test)	DS (top 5%)-12.515	9
Protein Docking	ZINC250k target: fa7 (test)	Docking Score (top 5%)-9.099	9
Protein Docking	ZINC250k target: 5ht1b (test)	DS (top 5%)-11.399	9
Protein Docking	ZINC250k target: jak2 (test)	DS (top 5%)-11.123	9
Target property optimization	PMO benchmark	Albuterol Similarity99.4	9
Protein Docking	ZINC250k target: braf (test)	DS (top 5%)-11.141	9
General Graph Generation	Tree (test)	V.U.N.97.5	7

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord