Text as a Universal Interface for Transferable Personalization
About
We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaque ``black-box'' profiles that are difficult to interpret and transfer across models and tasks. In contrast, we advocate natural language as a universal, model- and task-agnostic interface for preference representation. The formulation leads to interpretable and reusable preference descriptions, while naturally supporting continual evolution as new interactions are observed. To learn such representations, we introduce a two-stage training framework that combines supervised fine-tuning on high-quality synthesized data with reinforcement learning to optimize long-term utility and cross-task transferability. Based on this framework, we develop AlignXplore+, a universal preference reasoning model that generates textual preference summaries. Experiments on nine benchmarks show that our 8B model achieves state-of-the-art performanc -- outperforming substantially larger open-source models -- while exhibiting strong transferability across tasks, model families, and interaction formats.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Recommendation | MovieLens | Accuracy77.23 | 84 | |
| Response Selection | AlignX | Accuracy75.03 | 16 | |
| Response Selection | P-Soups Informativeness | Accuracy78.07 | 16 | |
| Recommendation | MIND | Accuracy71.8 | 16 | |
| Recommendation | AMAZON | Accuracy86.39 | 16 | |
| Response Generation | HiCUPID | Accuracy62.42 | 16 | |
| Response Selection | P-Soups Style | Accuracy0.8633 | 16 | |
| Response Selection | P-Soups Expertise | Accuracy82.5 | 16 | |
| Response Selection | PersonaMem | Accuracy58.08 | 16 |