Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

About

The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond: https://github.com/camel-ai/camel.

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy45.6
1362
Code GenerationHumanEval
Pass@131.71
1036
Mathematical ReasoningMATH
Accuracy22.3
535
Mathematical ReasoningMATH 500
pass@167.4
239
Code GenerationMBPP
Accuracy (%)78.1
146
Mathematical ReasoningGSM8K
EM88.6
123
Science ReasoningGPQA
Pass@111.11
50
Mathematical ReasoningAIME
Pass@16.67
44
Claim VerificationAVeriTeC Golden (dev)
Accuracy81.8
28
Claim VerificationAVeriTeC Retrieved (I) (dev)
Accuracy68.4
28
Showing 10 of 43 rows

Other info

Follow for update