Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

About

Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 24
Accuracy28.9
318
Mathematical ReasoningAIME 25
Pass@1 Accuracy16.7
178
Mathematical ReasoningAIME 25
Accuracy20
112
Code GenerationLiveCodeBench
Accuracy25.9
84
ReasoningDROP
Score86.14
42
Code GenerationCodeContests
Accuracy13.3
30
Code GenerationAPPS
Accuracy26.5
29
Mathematical ReasoningAIME 2024 and 2025 (test)
Overall Performance Rate57.14
18
Scientific problem solvingSciBench
Pass@2034.2
17
CodingMBPP
Solve Rate82.69
15
Showing 10 of 15 rows

Other info

Follow for update