Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Temporal Reasoning on LongMemEval-M
Loading...
12.69
F1 Score
HIPPOCAMPUS
2.7892
5.3596
7.93
10.5004
Feb 14, 2026
F1 Score
Accuracy
LLM-as-a-Judge Score
Updated 2d ago
Evaluation Results
Method
Method
Links
F1 Score
Accuracy
LLM-as-a-Judge Score
HIPPOCAMPUS
2026.02
12.69
6.77
1.86
MemOS
2026.02
9.59
5.08
1.65
MemoryOS
2026.02
8.29
4.4
1.57
A-mem
2026.02
7
3.72
1.49
MemGPT
2026.02
4.45
2.37
1.02
MemoryBank
2026.02
3.81
2.03
1.11
ReadAgent
2026.02
3.17
1.69
1.04
Feedback
Search any
task
Search any
task