Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

About

Conversational agents are increasingly woven into individuals' personal lives, yet users often underestimate the privacy risks associated with them. The moment users share information with these agents-such as large language models (LLMs)-their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLM-based Conversational Agents (LCAs). It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LCAs (untrusted receivers). Through a formative design user study, we observe how even "privacy-conscious" users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally deployable framework that operates between users and LCAs, identifying and reformulating out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user's intended interaction goals. Notably, about 76% of participants in our human evaluation preferred the reformulated prompts over the original ones, validating the usability and effectiveness of contextual privacy in our proposed framework. We opensource the code at https://github.com/IBM/contextual-privacy-LLM.

Ivoline Ngong, Swanand Kadhe, Hao Wang, Keerthiram Murugesan, Justin D. Weisz, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy• 2025

Related benchmarks

Task	Dataset	Result
PII detection	CAPID (test)	Span Precision59.97	7
PII detection	Reddit 150 samples (test)	Span Precision62.58	7
Question Answering Utility Evaluation	Reddit (test)	GPT-4 Score0.58	2
Question Answering Utility Evaluation	CAPID (test)	GPT-4 Score51	2

Showing 4 of 4 rows

Other info

Code

Follow for update

@wizwand_team Discord