LongMemEval

Benchmarks

Task Name	Dataset Name	SOTA Result
Long-context Memory Evaluation	LongMemEval	Average Score95.6	103
Long-context Question Answering	LongMemEval LongConvQA	SH Score90.3	84
Evidence-round recall	LongMemEval 100 question sample	Evidence-round Recall100	48
Long-term conversational memory	LongMemEval Small	F1 Score (%)51.2	40
Memory-augmented language modeling evaluation	LONGMEMEVAL-S	Accuracy73.2	31
Long-term Memory	LongMemEval	Score90.8	30
Long-term Memory Evaluation	LongMemEval S (test)	KU (Knowledge Update)94.4	30
Retrieval	LongMemEval	Recall@599	25
Memory	LongMemEval	Accuracy34.72	25
Dialogue-Style Memory Reasoning	LongMemEval	EM65.92	24
Dialogue Memory Accuracy	LongMemEval-S (N=500)	Temporal Accuracy91	24
Long-term Memory Evaluation	LongMemEvalS	Overall Score95.6	23
Memory Question Answering	LongMemEval	Accuracy76	22
Long-context Memory Retrieval and Reasoning	LongMemEval 1M	F1 Score49.58	20
Long-context Memory Retrieval and Reasoning	LongMemEval 128K	F1 Score47.26	20
End-to-End Performance	LongMemEval	Top-5 Recall59.9	20
Runtime Agent Memory	LongMemEval	F1 Score40.53	20
Question Answering	LongMemEval S (test)	QA Score (TR Context)84.21	19
Long-term Memory Retrieval	LongMemEval-S	SSU100	19
Long context memory management	LongMemEval	Precision68.5	18
Long-term dialogue memory	LongMemEval (test)	Accuracy85.75	18
Long-term memory evaluation	LongMemEval S	Single-User Score97.14	17
Retrieval	LongMemEval-S	Recall@594.68	17
Long-horizon Question Answering	LongMemEval-RR	F1 Score46.33	16
Long-term Memory Question Answering	LongMemEval-S (500 questions)	KU Accuracy98.7	16

Showing 25 of 121 rows