SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

About

In recent years, a variety of powerful LLM-based agentic systems have been applied to automate complex tasks through task orchestration. However, existing orchestration methods still face key challenges, including strategy collapse under reward maximization, high gradient variance with opaque credit assignment, and unguided skill evolution whose decisions are typically made by directly prompting an LLM to judge rather than derived from principled training signals. To address these challenges, we propose SkillFlow, a flow-based framework that takes a trainable Supervisor as the agent and a structured environment with dynamic skill library and frozen executor, automating task orchestration through multi-turn interaction. SkillFlow employs Tempered Trajectory Balance (TTB), a regression-based flow-matching loss that samples trajectories proportional to reward, preserving diverse orchestration strategies rather than collapsing to a single mode. The same flow objective yields a jointly learned backward policy that provides transparent per-step credit assignment at zero additional inference cost. Building on these flow diagnostics, a recursive skill evolution mechanism determines when to evolve, what skills to create or prune, and where decision gaps lie -- closing the loop from training signal to autonomous capability growth. Experimental results on 14 datasets show that SkillFlow significantly outperforms baselines across question answering, mathematical reasoning, code generation, and real-world interactive decision making tasks. Our code is available at https://anonymous.4open.science/r/SkillFlow-E850.

Mingda Zhang, Tiesunlong Shen, Haoran Luo, Wenjin Liu, Zikai Xiao, Erik Cambria, Xiaoying Tang• 2026

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval OOD	Pass@198.44	39
Question Answering	HotpotQA In-Distribution	F1 Score93.95	23
Web Navigation and E-commerce	WebShop In-Distribution	Average Score94.73	13
Embodied Task Completion	ALFWorld in-distribution held-out (test)	Success Rate96.09	9
Interactive Scientific Reasoning	ScienceWorld OOD (test)	Success57.81	9
Mathematical Problem Solving	MATH Hard OOD (test)	Accuracy96.09	9
Mathematical Reasoning	AIME In-Distribution 2026	Accuracy70	9
Medical Question Answering	MedQA In-Distribution	Accuracy92.19	9
Open-domain Question Answering	NQ-Open OOD (test)	Exact Match (EM)82.81	9
Question Answering	TriviaQA In-Distribution	Exact Match (EM)96.09	9

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord