FlowBot: Inducing LLM Workflows with Bilevel Optimization and Textual Gradients
About
LLM workflows, which coordinate structured calls to individual LLMs/agents to achieve a particular goal, offer a promising path towards building powerful AI systems that can tackle diverse tasks. However, existing approaches for building such workflows generally rely on human-crafted pipelines and prompts, which presents a substantial bottleneck in real world deployment. How can we automatically induce LLM-based agents and workflows in a data-driven way? This paper describes a simple data-driven approach for automatically inducing agents and LLM workflows. We formulate workflow induction as a bilevel optimization problem: an outer loop which optimizes a high-level sketch of the workflow (in particular how the LLM calls should be structured), and an inner loop which optimizes each individual LLM call one-by one. Both loops are optimized with ``textual gradients'' where for the inner loop we optimize each component in a modular way through ``backpropagating'' textual gradients layer-by-layer. We find that LLM workflows discovered through our \textsc{FlowBot} (work\textbf{flow} induction through \textbf{b}ilevel \textbf{o}ptimization and \textbf{t}extual gradients) approach performs competitively against strong baselines that make use of human-crafted or generated workflows.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Accuracy93.74 | 217 | |
| Reading Comprehension | DROP | DROP Accuracy92.28 | 129 | |
| Instruction Following | IFBench (test) | Score52.51 | 16 | |
| Fact Extraction and Claim Verification | HoVer (test) | Recall63.2 | 7 | |
| Multi-hop Question Answering | HotpotQA tool-augmented 1 (test) | EM72.8 | 7 | |
| Privacy-conscious Delegation | PUPA (test) | Score90.67 | 7 |