On a Connection Between Imitation Learning and RLHF

About

This work studies the alignment of large language models with preference data from an imitation learning perspective. We establish a close theoretical connection between reinforcement learning from human feedback RLHF and imitation learning (IL), revealing that RLHF implicitly performs imitation learning on the preference data distribution. Building on this connection, we propose DIL, a principled framework that directly optimizes the imitation learning objective. DIL provides a unified imitation learning perspective on alignment, encompassing existing alignment algorithms as special cases while naturally introducing new variants. By bridging IL and RLHF, DIL offers new insights into alignment with RLHF. Extensive experiments demonstrate that DIL outperforms existing methods on various challenging benchmarks.

Teng Xiao, Yige Yuan, Mingxiao Li, Zhengyu Chen, Vasant G Honavar• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	ARC-Challenge 0-shot (test)	Accuracy90	48
Personalization	Community Alignment (CA)	Personalization Win-Rate84.17	45
Personalization	Multi-Bench (MB)	Win Rate90.48	45
Personalization	PRISM	Personalization Win Rate78.52	45
Multiple-choice Question Answering	MMLU zero-shot (test)	Accuracy (MMLU zero-shot)76	27
Mathematical Reasoning	SVAMP 8-shot (test)	Accuracy92	25
Mathematical Reasoning	GSM8K 8-shot (test)	Accuracy92.5	25
Multiple-choice Question Answering	ARC-Easy zero-shot (test)	Accuracy93.6	25

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord