Attributing Emergence in Million-Agent Systems

About

Large language models (LLMs) can simulate human-like reasoning and decision-making in individual agents. LLM-powered multi-agent systems (MAS) combine such agents to simulate population-scale social phenomena such as polarization, information cascades, and market panics. Such studies require attributing macro emergence to individual agents, but existing axiomatic methods scale combinatorially in $N$ and have been confined to $N \lesssim 10^3$, while the phenomena they explain occur at $N \geq 10^6$. We address this gap by adapting Aumann--Shapley path-integral attribution to LLM-powered MAS at million-agent scale; the resulting method satisfies all four axioms, runs three to five orders of magnitude faster than sampled Shapley on the same hardware, and extends feasible axiomatic attribution by over three orders of magnitude (a $1670\times$ jump). We use this method to test the scale gap empirically: across 14 days of public Bluesky data ($1{,}671{,}587$ active users, five topics), we compute the attribution at both full scale and the visibility-biased $N = 10^2$ convenience sample used by small-scale studies, and the two disagree structurally. At full scale the long tail and middle tier jointly carry the majority; the biased small panel shifts about twice that share onto the upper follower tiers ($48\%$ versus $24\%$). We then prove that the disagreement cannot in general be reduced by post-hoc rescaling: an Attribution Scaling Bias theorem shows that a reconciling global rescaling factor exists exactly when the macro indicator is linear over agents, and our nonlinear indicators give residuals of $0.10$--$0.98$. For such nonlinear indicators, full-scale attribution is therefore a requirement rather than a methodological choice.

Ling Tang, Jilin Mei, Qian Chen, Qihan Ren, Linfeng Zhang, Quanshi Zhang, Jing Shao, Xia Hu, Dongrui Liu• 2026

Related benchmarks

Task	Dataset	Result
Emergence attribution runtime analysis	Mythos	Wall-clock Runtime (s)4.9	14
Multi-agent attribution	Multi-agent system (MAS) attribution published experiments	Axioms Satisfied4	8
Attribution Runtime Analysis	Mythos f^heat (N=10)	Runtime (s)4.9	6
Feature Attribution	Synthetic benchmark softplus aggregator nonlinear f (test)	MAE4.13	6
Multi-agent attribution	Synthetic Additive Benchmark	MAE1.15	6
Feature Attribution	Synthetic quadratic f with cross terms (test)	MAE3.4	6
Attribution Runtime Analysis	Mythos f^heat (N=10^3)	Runtime (s)1.2	4
Deletion-faithfulness	SocialLLM	Deletion AUC15.3	4
Attribution Runtime Analysis	Mythos f^heat N=10^2	Runtime (seconds)5.1	4
Attribution Runtime Analysis	Mythos f^heat (N=10^4)	Runtime (s)7	4

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord