Learning Transferable Latent User Preferences for Human-Aligned Decision Making

About

Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often struggle to produce human-aligned solutions. Human-aligned decision making requires accounting for both explicitly stated goals and latent user preferences that shape how ambiguous situations should be resolved. Existing approaches to incorporating such preferences either rely on extensive and repeated user interactions or fail to generalize latent preferences across tasks and contexts, limiting their practical applicability. We consider a setting in which an LLM is used for high-level reasoning and is responsible for inferring latent user preferences from limited interactions, which guides downstream decision making. We introduce CLIPR (Conversational Learning for Inferring Preferences and Reasoning), a framework that learns actionable, transferable natural language rules that represent latent user preferences from minimal conversational input. These rules are iteratively refined through adaptive feedback and applied to both in-distribution and out-of-distribution ambiguous tasks across multiple environments. Evaluations on three datasets and a user study show that CLIPR consistently outperforms existing methods in improving alignment and reducing inference costs.

Alina Hyk, Sandhya Saisubramanian• 2026

Related benchmarks

Task	Dataset	Result
Introspective Planning	KitchenAmbig (OOD)	Average Accuracy97.6	10
Preference-aligned decision making	AmbiK (test)	Accuracy84.6	10
Preference-aligned decision making	Housekeep (test)	Accuracy42.5	10
Preference-aligned decision making	Mobile Manipulation (test)	Accuracy67.1	10
Introspective Planning	KitchenAmbig (In-Distribution)	Average Accuracy94.3	10
User-aligned task completion	KitchenAmbig (In-Distribution)	Accuracy84	3
User-aligned task completion	KitchenAmbig (OOD)	Accuracy87.3	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord