Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

About

Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.

Minjia Wang, Yunfeng Wang, Xiao Ma, Dexin Lv, Qifan Guo, Lynn Zheng, Benliang Wang, Lei Wang, Jiannan Li, Yongwei Xing, David Xu, Zheng Sun• 2026

Related benchmarks

TaskDatasetResultRank
Intrinsic EvaluationEmail Datasets
Pairwise Correlation0.2093
10
Text ClassificationENRON
Accuracy61
9
Email Quality EvaluationEmail Datasets
Tone4.95
8
Email CategorizationPrivate
Accuracy0.18
6
Email CategorizationPrivate w/o Spam
Accuracy0.51
6
next message predictionPERSONA-CHAT (test)
Accuracy11.5
6
next message predictionPrivate (test)
Accuracy9.62
6
Question AnsweringENRON
ROUGE Score44.35
6
Question AnsweringHuman-Gen Phishing
ROUGE0.4089
6
Question AnsweringPrivate
ROUGE Score4.65
6
Showing 10 of 21 rows

Other info

Follow for update