Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use

About

Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.

Pratyush Desai, Luoxi Tang, Yuqiao Meng, Zhaohan Xi• 2026

Related benchmarks

TaskDatasetResultRank
Toxicity DetectionToxicChat
F1 Score1
9
Sensitive Information DetectionPIIBench
Precision100
5
Enterprise Data Leakage DetectionEnterpriseScenarios
Precision40.5
5
Showing 3 of 3 rows

Other info

Follow for update