AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

About

Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level code generation. Recent studies demonstrate that carefully designed multi-agent workflows and communication graphs can significantly improve code generation performance by leveraging collaborative reasoning. However, existing methods neither adapt topology density to task difficulty nor iteratively refine the topology within an instance using execution feedback, which leads to redundant communication and performance bottlenecks. To address these issues, we propose AgentConductor: a reinforcement learning-optimized MAS with an LLM-based orchestrator agent as its core, which enables end-to-end feedback-driven dynamic generation of interaction topologies. For each query, AgentConductor infers agent roles and task difficulty, then constructs a task-adapted, density-aware layered directed acyclic graph (DAG) topology, underpinned by two key innovations. First, we design a novel topological density function that captures communication-aware mathematical characterizations of multi-agent interactions. Second, we adopt difficulty interval partitioning to avoid excessive pruning for precise topological density upper bound measurement per difficulty level and finer-grained control. Empirically, across three competition-level and two foundational code datasets, AgentConductor achieves state-of-the-art accuracy, outperforming the strongest baseline by up to 14.6% in pass@1 accuracy, 13% in density reduction, and 68% in token cost reduction.

Siyu Wang, Ruotian Lu, Zhihao Yang, Yuchao Wang, Yanzhou Zhang, Lei Xu, Qimin Xu, Guojun Yin, Cailian Chen, Xinping Guan• 2026

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval (test)	Pass@197.5	612
Code Generation	MBPP (test)	Pass@195.1	405
Code Generation	CodeContests (test)	Pass@138.8	68
Code Generation	APPS (test)	--	36
Code Generation	LiveCodeBench V4 (test)	pass@146.3	14
Multi-hop reasoning and question answering	GAIA	GAIA L1 Acc72	2
Multi-hop reasoning and question answering	HLE	HLE Average22.6	2
Multi-hop reasoning and question answering	PopQA	PopQA Score50.3	2

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord