TokMem: One-Token Procedural Memory for Large Language Models

About

Large language models are typically controlled via prompts, which must be repeatedly re-processed for every new query and are difficult to reuse modularly. We introduce TokMem, a procedural memory framework that compiles each reusable task procedure into a single trainable memory token. Each token serves as both a procedure index and a generation control signal that steers generation, enabling targeted behaviors with constant-size overhead. TokMem keeps the backbone LLM frozen and stores procedural knowledge entirely in these dedicated units, so new procedures can be added continually without interfering with existing ones. We evaluate TokMem on two settings: atomic recall over 1,000 Super-Natural Instructions tasks and compositional recall on multi-step function-calling. Our results show that TokMem consistently outperforms retrieval-augmented prompting while avoiding repeated context overhead. Moreover, it matches or exceeds parameter-efficient fine-tuning with substantially fewer trainable parameters.

Zijun Wu, Yongchang Hao, Lili Mou• 2025

Related benchmarks

Task	Dataset	Result
Atomic Memory Recall	Super-Natural Instructions (SNI) (test)	ROUGE-L (10 tasks)75.6	18
Argument Generation	APIGen sampled (test)	Argument F1 (2 calls)88.1	15
Tool selection	APIGen sampled (test)	Tool Selection F1 (2 calls)99.4	15

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord