Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals

About

Large language models (LLMs) have demonstrated significant success in complex reasoning tasks such as math and coding. In contrast to these tasks where deductive reasoning predominates, inductive reasoning-the ability to derive general rules from incomplete evidence, remains underexplored. This paper investigates extended inductive reasoning in LLMs through the lens of personalized preference inference, a critical challenge in LLM alignment where current approaches struggle to capture diverse user preferences. The task demands strong inductive reasoning capabilities as user preferences are typically embedded implicitly across various interaction forms, requiring models to synthesize consistent preference patterns from scattered signals. We propose AlignXplore, a model that leverages extended reasoning chains to enable systematic preference inference from behavioral signals in users' interaction histories. Such explicit preference articulation enables efficient streaming inference: when new behavioral signals emerge, the model can directly build upon previously inferred preference descriptions rather than reprocessing historical signals from scratch, while also supporting iterative refinement to the inferred preferences. We develop AlignXplore by combining cold-start training based on synthetic data with subsequent online reinforcement learning. Through extensive experiments, we demonstrate that AlignXplore achieves substantial improvements over the backbone model by an average of 15.49\% on in-domain and out-of-domain benchmarks, while maintaining strong generalization ability across different input formats and downstream models. Further analyses establish best practices for preference inference learning through systematic comparison of reward modeling strategies, while revealing the emergence of human-like inductive reasoning patterns during training.

Jia-Nan Li, Jian Guan, Wei Wu, Rui Yan• 2025

Related benchmarks

TaskDatasetResultRank
RecommendationMovieLens
Accuracy69.93
84
Response SelectionAlignX
Accuracy69.9
16
Response SelectionP-Soups Informativeness
Accuracy76.24
16
Response SelectionPersonaMem
Accuracy53.98
16
Response SelectionP-Soups Style
Accuracy0.78
16
Response SelectionP-Soups Expertise
Accuracy72.66
16
RecommendationMIND
Accuracy61.23
16
Response GenerationHiCUPID
Accuracy53.5
16
RecommendationAMAZON
Accuracy79.01
16
Showing 9 of 9 rows

Other info

Follow for update