CoSER: A Comprehensive Literary Dataset and Framework for Training and Evaluating LLM Role-Playing and Persona Simulation

About

Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively.

Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen-tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Shuchang Zhou, Wei Wang, Yanghua Xiao• 2025

Related benchmarks

Task	Dataset	Result
User Simulation	τ-USI retail airline tasks (out-of-distribution)	Conv Score37.8	16
Role-playing Dialogue Evaluation	FURINA-Bench English	Context Reliance11.16	15
Role-playing	Role-playing evaluation (Main characters)	ROUGE-L (Haruhi)83.19	12
Human behavior simulation	SOUL (Social Understanding and Learning) (test)	FanToM3	9
Role-playing performance evaluation	Fandom (test)	Haruhi Adherence Score60.54	8
Role-playing performance evaluation	Bandori (test)	PoPiPa Score75.29	8
Role-playing performance	Role-playing Artifacts Minor characters 1.0	Score (K-On!)82.34	7
Role-playing	Role-playing evaluation (Minor characters)	K-On! ROUGE-L20.19	5
Role-playing performance	Bandori	PoPiPa Score41.83	4
Role-Playing Action Prediction	BanG Dream! Girls Band Party! Event 321 released on Feb 8th, 2026 (Live storyline)	Kasumi Performance40.87	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord