FlowBot: Inducing LLM Workflows with Bilevel Optimization and Textual Gradients

About

LLM workflows, which coordinate structured calls to individual LLMs/agents to achieve a particular goal, offer a promising path towards building powerful AI systems that can tackle diverse tasks. However, existing approaches for building such workflows generally rely on human-crafted pipelines and prompts, which presents a substantial bottleneck in real world deployment. How can we automatically induce LLM-based agents and workflows in a data-driven way? This paper describes a simple data-driven approach for automatically inducing agents and LLM workflows. We formulate workflow induction as a bilevel optimization problem: an outer loop which optimizes a high-level sketch of the workflow (in particular how the LLM calls should be structured), and an inner loop which optimizes each individual LLM call one-by one. Both loops are optimized with ``textual gradients'' where for the inner loop we optimize each component in a modular way through ``backpropagating'' textual gradients layer-by-layer. We find that LLM workflows discovered through our \textsc{FlowBot} (work\textbf{flow} induction through \textbf{b}ilevel \textbf{o}ptimization and \textbf{t}extual gradients) approach performs competitively against strong baselines that make use of human-crafted or generated workflows.

Hongyeon Yu, Young-Bum Kim, Yoon Kim• 2026

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	Accuracy93.74	224
Reading Comprehension	DROP	DROP Accuracy92.28	138
Instruction Following	IFBench (test)	Score52.51	36
Fact Extraction and Claim Verification	HoVer (test)	Recall63.2	7
Multi-hop Question Answering	HotpotQA tool-augmented 1 (test)	EM72.8	7
Privacy-conscious Delegation	PUPA (test)	Score90.67	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord