Simulating Human Memory with Language Models

About

Language models are increasingly being deployed as user simulators, but their memory is far more reliable than that of real users. To measure this gap, we run a series of classic memory experiments from psychology on both humans and language models. Across tasks, we find that out-of-the-box language models exhibit better memory than humans, even when prompted to imitate human behavior. We then show that better prompting strategies and the use of a compactor can cause language models to forget content in a more human-like way. Using these methods, we show preliminary evidence that language models with human-like memory constraints can function as more effective user simulators in a downstream education task. Finally, we release human reference data and benchmarks to support future work on simulating human memory with language models.

Qihan Wang, Nicholas Tomlin, Michael Hu, Brian Dillon, Tal Linzen• 2026

Related benchmarks

Task	Dataset	Result
Digit Span	Digit Span	Human-Model Similarity0.95	42
N-Back	N-back	Human-Model Similarity (1 - NWD)0.524	12
Aggregate Cognitive Performance	Average	Human-Model Similarity (1-nWD)6.9	8
Narrative QA	Narrative QA	Score5.72	7
N-Back	N-back	Accuracy87.4	4
Narrative Free Recall	Narrative Free Recall	Recall Accuracy73.64	4
Craft Task	Craft Task	Score14.14	3
Reverse Digit Span	Reverse Digit Span	Score17	3
Factual QA	Factual QA	Score7.56	3
Variable Mapping	Variable Mapping	Score8.56	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord