Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AFlow: Automating Agentic Workflow Generation

About

Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFlow enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code is available at https://github.com/FoundationAgents/AFlow.

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, Chenglin Wu• 2024

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Pass@197.78
850
Mathematical ReasoningGSM8K (test)
Accuracy96.5
797
Mathematical ReasoningGSM8K (test)
Accuracy94.91
751
Mathematical ReasoningMATH--
643
Mathematical ReasoningMATH
Accuracy70.31
535
ReasoningBBH--
507
Code GenerationHumanEval (test)
Pass@194.2
444
Mathematical ReasoningMATH (test)
Overall Accuracy60.1
433
Mathematical ReasoningGSM8K
Accuracy91.16
351
Code GenerationMBPP (test)
Pass@182.4
276
Showing 10 of 103 rows
...

Other info

Follow for update