DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

About

Role-playing with large language models is fundamentally a session-level task, requiring agents to sustain character identity and interaction quality across extended multi-turn conversations. Yet existing evaluation and optimization methods remain largely turn-level, failing to capture long-horizon quality. We propose DynSess, a unified session-level framework for role-playing agents. DynSess-Eval scores complete dialogue sessions via rubrics targeting long-horizon behaviors. Leveraging its session-level rewards, we construct high-quality training trajectories through multi-turn lookahead search and train DynSess-Character with two complementary variants: DSPO (off-policy) and GSRPO (on-policy). Experiments show that DynSess-Eval aligns with human judgments substantially better than prior evaluators, and blind human evaluation further shows that DynSess-Character matches the strongest character model despite using substantially fewer parameters, while maintaining strong role consistency and interactive ability. Our dataset and code will be released to facilitate future research.

Rongsheng Zhang, Jiji Tang, Junnan Ren, Zuyi Bao, Weijie Chen, Ruofan Hu, Zhou Zhao, Tangjie Lv, Yan Zhang• 2026

Related benchmarks

Task	Dataset	Result
Role-playing	DynSess-Eval Human Evaluation 1.0 (test)	Average Score3.37	10
Role-playing evaluation	DynSess-Eval	--	8
Evaluator Alignment with Human Judgments	DynSess-Eval 8 personas (test)	Role Consistency Rank Acc67	7

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord