Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AFlow: Automating Agentic Workflow Generation

About

Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFlow enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code is available at https://github.com/FoundationAgents/AFlow.

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, Chenglin Wu• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K--
1362
Code GenerationHumanEval
Pass@197.78
1036
Mathematical ReasoningGSM8K (test)
Accuracy96.5
900
Mathematical ReasoningMATH--
882
Language UnderstandingMMLU
Accuracy69.31
825
Mathematical ReasoningGSM8K (test)
Accuracy94.91
770
ReasoningBBH--
672
Mathematical ReasoningMATH
Accuracy70.31
535
Code GenerationHumanEval (test)
Pass@194.2
506
Mathematical ReasoningGSM8K
Accuracy91.16
499
Showing 10 of 144 rows
...

Other info

Follow for update