FlowSteer: Towards Agents Designing Agentic Workflows via Reinforced Progressive Canvas Editing
About
In recent years, agentic workflows have been widely applied to solve complex human tasks. However, existing workflow construction still faces key challenges, including human-dependent workflow construction, the lack of graph-level execution feedback, and the inability to repair errors in-loop during long-horizon construction. To address these challenges, we propose FlowSteer, a new paradigm of Agent Designing Agentic Workflows - a single agent itself end-to-end designs the workflow that a downstream executor runs. To support this paradigm, we introduce the Workflow Canvas, a novel executable graph-state environment that returns syntax-checked execution feedback for every atomic edit. Built on the canvas, we further propose Reinforced Progressive Canvas Editing, in which a lightweight policy agent issues one atomic edit per turn conditioned on real canvas feedback, and is trained end-to-end via reinforcement learning. Moreover, FlowSteer provides a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends. Experimental results on twelve datasets show that FlowSteer significantly outperforms baselines across various tasks. Our code is available at https://anonymous.4open.science/r/FlowSteer-9B2E.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@192.96 | 1043 | |
| Mathematical Reasoning | MATH | Accuracy81.25 | 535 | |
| Mathematical Reasoning | MathQA | Accuracy88.67 | 354 | |
| Mathematical Reasoning | AIME 2025 | Accuracy26.67 | 227 | |
| Question Answering | SQuAD 2.0 | F183.67 | 215 | |
| Question Answering | HotpotQA | F184.98 | 132 | |
| Code Generation | APPS | Pass@149.21 | 111 | |
| Question Answering | TriviaQA | F184.11 | 46 | |
| Question Answering | NaturalQuestions | F162.56 | 42 | |
| Code Generation | HumanEval OOD | Pass@193.75 | 39 |