Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Attributing Emergence in Million-Agent Systems

About

Large language models (LLMs) can simulate human-like reasoning and decision-making in individual agents. LLM-powered multi-agent systems (MAS) combine such agents to simulate population-scale social phenomena such as polarization, information cascades, and market panics. Such studies require attributing macro emergence to individual agents, but existing axiomatic methods scale combinatorially in $N$ and have been confined to $N \lesssim 10^3$, while the phenomena they explain occur at $N \geq 10^6$. We address this gap by adapting Aumann--Shapley path-integral attribution to LLM-powered MAS at million-agent scale; the resulting method satisfies all four axioms, runs four to five orders of magnitude faster than sampled Shapley on the same hardware. We use this method to test the scale gap empirically: across 14 days of public Bluesky data ($1{,}671{,}587$ active users), we compute the attribution at both full scale and the visibility-biased $N = 10^2$ convenience sample used by small-scale studies, and the two disagree structurally. At full scale the long tail and middle tier jointly carry the majority; the biased small panel attributes almost everything to a few high-follower accounts. We then prove that under any nonlinear macro indicator the disagreement cannot be reduced by post-hoc rescaling: an Attribution Scaling Bias theorem shows that no global rescaling factor can reconcile small-scale and full-scale attribution. Full-scale attribution is therefore not a methodological choice but a theoretical requirement for any nonlinear macro indicator.

Ling Tang, Jilin Mei, Qian Chen, Qihan Ren, Linfeng Zhang, Quanshi Zhang, Jing Shao, Xia Hu, Dongrui Liu• 2026

Related benchmarks

TaskDatasetResultRank
Emergence attribution runtime analysisMythos
Wall-clock Runtime (s)4.9
14
Multi-agent attributionMulti-agent system (MAS) attribution published experiments
Axioms Satisfied4
8
Attribution Runtime AnalysisMythos f^heat (N=10)
Runtime (s)4.9
6
Feature AttributionSynthetic benchmark softplus aggregator nonlinear f (test)
MAE4.13
6
Multi-agent attributionSynthetic Additive Benchmark
MAE1.15
6
Feature AttributionSynthetic quadratic f with cross terms (test)
MAE3.4
6
Attribution Runtime AnalysisMythos f^heat (N=10^3)
Runtime (s)1.2
4
Deletion-faithfulnessSocialLLM
Deletion AUC15.3
4
Attribution Runtime AnalysisMythos f^heat N=10^2
Runtime (seconds)5.1
4
Attribution Runtime AnalysisMythos f^heat (N=10^4)
Runtime (s)7
4
Showing 10 of 14 rows

Other info

Follow for update