Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction

About

The rapid evolution of agentic workflows has demonstrated strong performance of LLM-based agents in addressing complex reasoning tasks. However, existing workflow optimization methods typically formulate workflow synthesis as a static, one-shot code-centric generation problem. This paradigm imposes excessive constraints on the model's coding capabilities and restricts the flexibility required for dynamic problem-solving. In this paper, we present Workflow-R1, a framework that reformulates workflow construction as a multi-turn, natural language-based sequential decision-making process. To resolve the optimization granularity mismatch inherent in such multi-turn interactions, we introduce Group Sub-sequence Policy Optimization (GSsPO). While explicitly tailored to align with the interleaved Think-Action dynamics of agentic reasoning, GSsPO fundamentally functions as a structure-aware RL algorithm generalizable to a broad class of multi-turn agentic sequential decision-making tasks. By recalibrating the optimization unit to the composite sub-sequence, specifically the atomic Think-Action cycle, it aligns gradient updates with the semantic boundaries of these interactions, ensuring robust learning in complex multi-turn reasoning tasks. Through extensive experiments on multiple QA benchmarks, Workflow-R1 outperforms competitive baselines, validating GSsPO as a generalized solution for sequential reasoning and establishing Workflow-R1 as a promising new paradigm for automated workflow optimization.

Mingze Kong, Zikun Qu, Zhongquan Zhou, Pengyu Liang, Xiang Li, Zhiwei Shang, Zhi Hong, Kaiyu Huang, Zhiyong Wang, Zhongxiang Dai• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA (test)	--	311
Multi-hop Question Answering	2WikiMultiHopQA (test)	EM63.3	226
Question Answering	NQ (test)	EM Accuracy43.5	133
Multi-hop Question Answering	Bamboogle (test)	EM57.6	98
General Question Answering	TriviaQA (test)	EM73.3	10
General Question Answering	PopQA (test)	EM49.3	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord