Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

About

Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.

Wenjin Liu, Haoran Luo, Xueyuan Lin, Haoming Liu, Tiesunlong Shen, Jiapu Wang, Rui Mao, Erik Cambria• 2025

Related benchmarks

Task	Dataset	Result
General QA	PopQA	Exact Match (EM)28.13	58
Multi-hop Reasoning	HotpotQA	F1 Score52.31	22
Multi-hop Reasoning	TriviaQA	Exact Match (EM)70.31	17
mathematical computation	MathQA	Exact Match (EM)52.34	10
mathematical computation	GSM8K	EM97.66	10
mathematical computation	DAPO Math	Exact Match (EM)26.56	10
Multi-hop Reasoning	2WikiMultihopQA	Exact Match (EM)48.44	10
standard QA	SQuAD v2	EM19.53	10
standard QA	MuSiQue	EM18.75	10
Text Generation	Xsum	F1 Score25.76	10

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord