Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Latent Preference Modeling for Cross-Session Personalized Tool Calling

About

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

Yejin Yoon, Minseo Kim, Taeuk Kim• 2026

Related benchmarks

TaskDatasetResultRank
Latent Preference ModelingMPT Context-Free, Preference Recall
Precision76.1
19
Latent Preference ModelingMPT Context-Free Preference Induction
Precision0.5487
19
Latent Preference ModelingMPT Context-Free, Preference Transfer
Precision30.92
19
Latent Preference ModelingMPT Context-Free Average
F1 Score58.5
19
Preference-driven Tool CallingMPT Context-Guided, Preference Recall
P-EM64.88
19
Preference-driven Tool CallingMPT Context-Guided, Preference Induction
P-EM37.95
19
Preference-driven Tool CallingMPT Context-Guided, Preference Transfer
P-EM26.19
19
Preference-driven Tool CallingMPT Context-Guided Average
OA-F167.18
19
Showing 8 of 8 rows

Other info

Follow for update