FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

About

Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data. Direct Preference Optimization (DPO) offers an efficient alternative to reinforcement learning with human feedback (RLHF), but its direct application in FL suffers from severe performance degradation under non-IID data and limited generalization of implicit rewards. To bridge this gap, we propose FedPDPO (Federated Personalized Direct Preference Optimization), a personalized federated framework for preference alignment of LLMs. It adopts a parameter-efficient fine-tuning architecture where each client maintains a frozen pretrained LLM backbone augmented with a Low-Rank Adaptation (LoRA) adapter, enabling communication-efficient aggregation. To address non-IID heterogeneity, we devise (1) the globally shared LoRA adapter with the personalized client-specific LLM head. Moreover, we introduce (2) a personalized DPO training strategy with a client-specific explicit reward head to complement implicit rewards and further alleviate non-IID heterogeneity, and (3) a bottleneck adapter to balance global and local features. We provide theoretical analysis establishing the probabilistic foundation and soundness. Extensive experiments on multiple preference datasets demonstrate state-of-the-art performance, achieving up to 4.80% average accuracy improvements in federated intra-domain and cross-domain settings.

Kewen Zhu, Liping Yi, Zhiming Zhao, Zhuang Qi, Han Yu, Qinghua Hu• 2026

Related benchmarks

Task	Dataset	Result
Direct Preference Optimization	IMDB 3 clients	Accuracy87.41	11
Direct Preference Optimization	IMDB 10 clients	Accuracy85.23	11
Direct Preference Optimization	Code-Vulnerability-Security 3 clients	Accuracy96.92	11
Direct Preference Optimization	Code-Vulnerability-Security 11 clients	Accuracy92.8	11
Direct Preference Optimization	WebGPT	Accuracy58.92	11
Direct Preference Optimization	PyDPO	Accuracy91.47	11
Direct Preference Optimization	UltraFeedback	Accuracy69.92	11
Preference Alignment	WebGPT (test)	Accuracy61.24	11
Preference Alignment	PyDPO (test)	Accuracy94.32	11
Preference Alignment	UltraFeedback (test)	Accuracy74.18	11

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord