AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

About

Large Language Models (LLMs) are increasingly used as autonomous agents in complex, long-horizon applications, where effective memory is critical for sustained performance. Yet existing memory benchmarks are largely dialogue-centric, while real agent memory consists of continuous agent-environment interaction trajectories composed of states, actions, observations, and tool outputs. To address this gap, we introduce **AMA-Bench** (**A**gent **M**emory with **A**ny length), a benchmark for evaluating long-horizon memory in realistic agentic settings. AMA-Bench combines real-world agent trajectories from representative applications with expert-curated QA, as well as synthetic trajectories that scale to arbitrary horizons with rule-based QA. Our study shows that existing memory systems underperform because they fail to capture causal and objective information and rely heavily on lossy similarity-based retrieval. We further propose **AMA-Agent**, a memory system based on causality-graph construction and tool-augmented retrieval. AMA-Agent achieves **57.22%** accuracy on AMA-Bench, outperforming the strongest baseline by **11.16%**. Resources are available at: [https://ama-bench.github.io/](https://ama-bench.github.io/).

Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhilash Shankarampeta, Zimeng Huang, Wentao Ni, Yuandong Tian, Jishen Zhao• 2026

Related benchmarks

Task	Dataset	Result
Embodied Task Completion	AlfWorld	Success Rate31.7	106
Agent Memory	AMA-Bench real-world	Recall (Accuracy)62.38	14
Interactive agentic task completion	MemoryArena	Bundled Web Shop PS30	14
Agentic Memory Retrieval	MemoryAgentBench	Access Rate66	10
Long-context Question Answering	Locomo	Accuracy (LoCoMo QA)54.5	10
Agentic Question Answering	AMABench	A-ALF Score10	10
Embodied AI	AMA-bench Embodied AI Domain	TTFT Mean (std) [s]0.9262	8
Trajectory QA	AMA-Bench Full 208-episode trajectory	F1 Score36.8	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord