Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction

About

The rapid evolution of agentic workflows has demonstrated strong performance of LLM-based agents in addressing complex reasoning tasks. However, existing workflow optimization methods typically formulate workflow synthesis as a static, one-shot code-centric generation problem. This paradigm imposes excessive constraints on the model's coding capabilities and restricts the flexibility required for dynamic problem-solving. In this paper, we present Workflow-R1, a framework that reformulates workflow construction as a multi-turn, natural language-based sequential decision-making process. To resolve the optimization granularity mismatch inherent in such multi-turn interactions, we introduce Group Sub-sequence Policy Optimization (GSsPO). While explicitly tailored to align with the interleaved Think-Action dynamics of agentic reasoning, GSsPO fundamentally functions as a structure-aware RL algorithm generalizable to a broad class of multi-turn agentic sequential decision-making tasks. By recalibrating the optimization unit to the composite sub-sequence, specifically the atomic Think-Action cycle, it aligns gradient updates with the semantic boundaries of these interactions, ensuring robust learning in complex multi-turn reasoning tasks. Through extensive experiments on multiple QA benchmarks, Workflow-R1 outperforms competitive baselines, validating GSsPO as a generalized solution for sequential reasoning and establishing Workflow-R1 as a promising new paradigm for automated workflow optimization.

Mingze Kong, Zikun Qu, Zhongquan Zhou, Pengyu Liang, Xiang Li, Zhiwei Shang, Zhi Hong, Kaiyu Huang, Zhiyong Wang, Zhongxiang Dai• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA (test)--
198
Multi-hop Question Answering2WikiMultiHopQA (test)
EM63.3
143
Question AnsweringNQ (test)
EM Accuracy43.5
66
Multi-hop Question AnsweringBamboogle (test)
EM57.6
46
General Question AnsweringTriviaQA (test)
EM73.3
10
General Question AnsweringPopQA (test)
EM49.3
10
Showing 6 of 6 rows

Other info

Follow for update