Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GFlowPO: Generative Flow Network as a Language Model Prompt Optimizer

About

Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation. Yet, existing RL-based prompt optimizers often rely on on-policy updates and a meta-prompt sampled from a fixed distribution, leading to poor sample efficiency. We propose GFlowPO, a probabilistic prompt optimization framework that casts prompt search as a posterior inference problem over latent prompts regularized by a meta-prompted reference-LM prior. In the first step, we fine-tune a lightweight prompt-LM with an off-policy Generative Flow Network (GFlowNet) objective, using a replay-based training policy that reuses past prompt evaluations to enable sample-efficient exploration. In the second step, we introduce Dynamic Memory Update (DMU), a training-free mechanism that updates the meta-prompt by injecting both (i) diverse prompts from a replay buffer and (ii) top-performing prompts from a small priority queue, thereby progressively concentrating the search process on high-reward regions. Across few-shot text classification, instruction induction benchmarks, and question answering tasks, GFlowPO consistently outperforms recent discrete prompt optimization baselines.

Junmo Cho, Suhan Kim, Sangjune An, Minsu Kim, Dong Bok Lee, Heejun Lee, Sung Ju Hwang, Hae Beom Lee• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringOpenBookQA
Accuracy76.2
465
Question AnsweringMMLU
Accuracy55.6
62
Few-shot Text ClassificationGLUE SuperGLUE SNLI subsets (test)
SST-2 Accuracy93
12
Instruction InductionInstruction Induction (test)--
10
Instruction InductionBigBench Instruction Induction (BBII) (test)
BBII Text Classification Score60.14
6
Showing 5 of 5 rows

Other info

Follow for update