SDPO: Segment-Level Direct Preference Optimization for Social Agents

About

Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human preferences across various agent tasks. However, standard DPO focuses solely on individual turns, which limits its effectiveness in multi-turn social interactions. Several DPO-based multi-turn alignment methods with session-level data have shown potential in addressing this problem.While these methods consider multiple turns across entire sessions, they are often overly coarse-grained, introducing training noise, and lack robust theoretical support. To resolve these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which dynamically select key segments within interactions to optimize multi-turn agent behavior. SDPO minimizes training noise and is grounded in a rigorous theoretical framework. Evaluations on the SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring SDPO's potential to advance the social intelligence of LLM-based agents. We release our code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO.

Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu, Ke Wang, Xiaoqian Liu, Qicheng Li, Yong Qin, Fei Huang• 2025

Related benchmarks

Task	Dataset	Result
Social Dialogue	SOTOPIA Self-Chat	GOAL8.56	28
Social Dialogue	SOTOPIA Interaction with GPT-4o	Goal Score8.14	28
Social Dialogue	SOTOPIA Overall (AVG)	AVG Score5.63	11
Social Dialogue	SOTOPIA Interaction with GPT-4o-mini	GOAL Score7.53	11
Next-item prediction	Amazon Review Industrial (test)	HR@30.1032	11
Next-item prediction	Amazon Review Office (test)	HR@311.69	11
Social Simulation	Sotopia standard (test)	Goal Score8.13	8
Social Simulation	Sotopia hard (test)	Goal Score6.35	8

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord