Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

About

LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge. Towards cognitive simulation in LLM role-play, previous efforts mainly suffer from two deficiencies: lacking data with high-quality reasoning traces, and lacking reliable reward signals aligned with human preferences. In this paper, we propose HER, a unified framework for cognitive-level persona simulation. HER introduces dual-layer thinking, which distinguishes characters' first-person thinking from LLMs' third-person thinking. To bridge these gaps, we curate reasoning-augmented role-playing data via reverse engineering, and construct human-aligned principles and reward models. Leveraging these resources, we train HER models based on Qwen3-32B via supervised and reinforcement learning. Extensive experiments validate the effectiveness of our approach. Notably, our models significantly outperform the Qwen3-32B baseline, achieving a 30.26 improvement on the CoSER benchmark and a 14.97% gain on the Minimax Role-Play Bench. Our datasets, principles, and models are released to facilitate future research.

Chengyu Du, Xintao Wang, Aili Chen, Weiyuan Li, Rui Xu, Junteng Liu, Zishan Huang, Rong Tian, Zijun Sun, Yuhao Li, Liheng Feng, Deming Ding, Pengyu Zhao, Yanghua Xiao• 2026

Related benchmarks

TaskDatasetResultRank
Theory of MindHiToM
Accuracy56
64
Role-PlayMinimax Role-Play Bench Full 17-Model Leaderboard 1.0
Overall Score84.65
17
Role-Play EvaluationMinimax Role-Play Bench
Average Score84.65
17
Role-Play EvaluationCoSER
Avg Score57.3
17
Role-playingDynSess-Eval Human Evaluation 1.0 (test)
Average Score2.99
10
Human behavior simulationSOUL (Social Understanding and Learning) (test)
FanToM55
9
Role-playing evaluationDynSess-Eval
Average Performance (Auto)3.23
8
Role-PlayLifeChoices
Primary Score75
6
User SimulationUserLLM
Primary Metric Score53.7
6
Persona SimulationTwinVoice
Accuracy39
6
Showing 10 of 10 rows

Other info

Follow for update