Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Simulating Human Memory with Language Models

About

Language models are increasingly being deployed as user simulators, but their memory is far more reliable than that of real users. To measure this gap, we run a series of classic memory experiments from psychology on both humans and language models. Across tasks, we find that out-of-the-box language models exhibit better memory than humans, even when prompted to imitate human behavior. We then show that better prompting strategies and the use of a compactor can cause language models to forget content in a more human-like way. Using these methods, we show preliminary evidence that language models with human-like memory constraints can function as more effective user simulators in a downstream education task. Finally, we release human reference data and benchmarks to support future work on simulating human memory with language models.

Qihan Wang, Nicholas Tomlin, Michael Hu, Brian Dillon, Tal Linzen• 2026

Related benchmarks

TaskDatasetResultRank
Digit SpanDigit Span
Human-Model Similarity0.95
42
N-BackN-back
Human-Model Similarity (1 - NWD)0.524
12
Aggregate Cognitive PerformanceAverage
Human-Model Similarity (1-nWD)6.9
8
N-BackN-back
Accuracy87.4
4
Narrative Free RecallNarrative Free Recall
Recall Accuracy73.64
4
Craft TaskCraft Task
Score14.14
3
Reverse Digit SpanReverse Digit Span
Score17
3
Factual QAFactual QA
Score7.56
3
Variable MappingVariable Mapping
Score8.56
3
Word RecognitionWord Recognition
Accuracy40.2
3
Showing 10 of 12 rows

Other info

Follow for update