Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

About

Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.

Xuhui Zhou, Zhe Su, Tiwalayo Eisape, Hyunwoo Kim, Maarten Sap• 2024

Related benchmarks

Task	Dataset	Result
Dialogue Naturalness	Persona-Scenario	GPT-Score: Fluency98.55	3
Dialogue Quality Evaluation	Persona-Scenario	Believability3.7	3
Dialogue Goal Completion	Persona-Scenario	Believability (Sotopia-Eval)8.95	3
Meeting Transcript Evaluation	OMNI	Coherence3.5	1

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord