Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

About

Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.

Wenjin Liu, Haoran Luo, Xueyuan Lin, Haoming Liu, Tiesunlong Shen, Jiapu Wang, Rui Mao, Erik Cambria• 2025

Related benchmarks

TaskDatasetResultRank
General QAPopQA
Exact Match (EM)28.13
28
Multi-hop ReasoningHotpotQA--
20
Multi-hop ReasoningTriviaQA
Exact Match (EM)70.31
17
mathematical computationMathQA
Exact Match (EM)52.34
10
mathematical computationGSM8K
EM97.66
10
mathematical computationDAPO Math
Exact Match (EM)26.56
10
Multi-hop Reasoning2WikiMultihopQA
Exact Match (EM)48.44
10
standard QASQuAD v2
EM19.53
10
standard QAMuSiQue
EM18.75
10
Text GenerationXsum
F1 Score25.76
10
Showing 10 of 12 rows

Other info

Follow for update