Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

About

In large language model (LLM)-driven multi-agent systems, disobey role specification (failure to adhere to the defined responsibilities and constraints of an assigned role, potentially leading to an agent behaving like another) is a major failure mode \cite{DBLP:journals/corr/abs-2503-13657}. To address this issue, in the present paper, we propose a quantitative role clarity to improve role consistency. Firstly, we construct a role assignment matrix $S(\phi)=[s_{ij}(\phi)]$, where $s_{ij}(\phi)$ is the semantic similarity between the $i$-th agent's behavior trajectory and the $j$-th agent's role description. Then we define role clarity matrix $M(\phi)$ as $\text{softmax}(S(\phi))-I$, where $\text{softmax}(S(\phi))$ is a row-wise softmax of $S(\phi)$ and $I$ is the identity matrix. The Frobenius norm of $M(\phi)$ quantifies the alignment between agents' role descriptions and their behaviors trajectory. Moreover, we employ the role clarity matrix as a regularizer during lightweight fine-tuning to improve role consistency, thereby improving end-to-end task performance. Experiments on the ChatDev multi-agent system show that our method substantially improves role consistency and task performance: with Qwen and Llama, the role overstepping rate decreases from $46.4\%$ to $8.4\%$ and from $43.4\%$ to $0.2\%$, respectively, and the role clarity score increases from $0.5328$ to $0.9097$ and from $0.5007$ to $0.8530$, respectively, the task success rate increases from $0.6769$ to $0.6909$ and from $0.6174$ to $0.6763$, respectively.

Guoling Zhou, Wenpei Han, Fengqin Yang, Li Wang, Yingcong Zhou, Zhiguo Fu• 2026

Related benchmarks

Task	Dataset	Result
Role clarity	SWE easy (dev)	Role Clarity Score0.9081	8
Role clarity	SWE hard (dev)	Role Clarity Score90.76	8
Role clarity	SWE (dev total)	Total Role Clarity Score90.79	8
Multi-Agent Collaboration	SRDD	Completeness76.35	4
Multi-Agent Collaboration Role Overstepping	SWE easy (dev)	Overstepping Rate (<INFO>)0.4	4
Multi-Agent Collaboration Role Overstepping	SWE hard subset (dev)	Overstepping Rate (<INFO>)0.00e+0	4
Multi-Agent Collaboration Role Overstepping	SWE total full set (dev)	Overstepping Rate (<INFO>)0.2	4
Role Consistency	SWE easy subset dev (test)	Overstepping Rate (<INFO>)10	4
Role Consistency	SWE Dev hard (test)	Overstepping Rate (<INFO>)6.8	4
Role Consistency	SWE dev full set (test)	Total Overstepping Rate (<INFO>)8.4	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord