Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-agent AI systems outperform human teams in creativity

About

Although artificial intelligence (AI) now matches or exceeds human performance across numerous cognitive tasks, creativity remains a highly contested frontier. As AI systems based on large language models (LLMs) are increasingly adopted in research and innovation, it is essential to understand and augment their creativity. Here we demonstrate that multi-agent LLM teams not only surpass single agents, but also substantially outperform human teams in creativity (Cohen's d=1.50) across 4,541 multi-agent LLM ideas and 341 human-team ideas on six diverse problem-solving tasks. This advantage is driven by novelty while maintaining comparable usefulness. To investigate the generative processes in both groups, we represent conversations as paths through semantic space using neural language model representations. Both LLM and human teams produce more creative ideas when conversations range widely rather than staying centered on a single theme (low global coherence). However, the additional patterns that predict creativity differ: LLM teams benefit from efficient exploration (high semantic spread, shorter paths), while human teams benefit from maintaining smooth conversational flow (high local coherence, frequent pivots). Additionally, we identify model choice and discussion structure as orthogonal design levers that together explain 26.8% of variance in LLM conversational dynamics, paving the way for systematic approaches to developing multi-agent systems with augmented creative capabilities.

Tiancheng Hu, Yixuan Jiang, Haotian Li, Jos\'e Hern\'andez-Orallo, Xing Xie, Nigel Collier, David Stillwell, Luning Sun• 2026

Related benchmarks

TaskDatasetResultRank
Idea GenerationEducation Inequality
Creativity10
4
Idea GenerationPlastic Waste
Creativity10
4
Creativity EvaluationCreativity Evaluation (Overall)
Creativity Score0.3
2
Creativity EvaluationEmployee Attrition Specific prompt
Creativity28
2
Creativity EvaluationSinging in Shower Specific prompt
Creativity Score24
2
Creativity EvaluationSorry Pandemic Specific prompt
Creativity0.28
2
Creativity EvaluationSupply Chain Specific prompt
Creativity0.33
2
Idea GenerationSorry Pandemic
Creativity10
2
Idea GenerationSupply Chain
Creativity10
2
Idea GenerationEmployee Attrition
Creativity10
2
Showing 10 of 11 rows

Other info

Follow for update