Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

About

Generative Flow Networks (GFlowNets) excel at sampling diverse, high-reward objects. In many practical applications where active reward queries are infeasible, these models must be trained using static offline datasets. Prevailing training methods typically rely on a proxy model to provide reward feedback for online sampled trajectories. However, constructing a reliable proxy is often challenging due to data scarcity or high evaluation costs. While existing proxy-free approaches attempt to address this, they often impose coarse constraints that limit the model's ability to explore effectively. To overcome these limitations, we propose Trajectory-Distilled GFlowNet (TD-GFN), a novel proxy-free training framework. TD-GFN utilizes inverse reinforcement learning (IRL) to extract dense, transition-level edge rewards from offline trajectories, providing rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards guide the policy indirectly through DAG pruning and prioritized backward sampling. This design ensures that gradient updates rely exclusively on ground-truth terminal rewards from the dataset, thereby preventing error propagation. Empirical results demonstrate that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.

Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang• 2025

Related benchmarks

TaskDatasetResultRank
Molecule DesignMolecule Design 1,500 samples (train)
Reward (R-10)7.733
13
Listwise RecommendationML 1M (test)
Avg. Reward1.988
4
Showing 2 of 2 rows

Other info

Follow for update