Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context memory evaluation on MemBench
Loading...
87.5
Recall
EVOLVEMEM
35.5
49
62.5
76
May 13, 2026
Recall
Reasoning Score
Robustness Score
Overall Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Recall
Reasoning Score
Robustness Score
Overall Score
EVOLVEMEM
Backbone=GPT-4o
2026.05
87.5
66.7
50
67.9
EVOLVEMEM
Backbone=GPT-5.1
2026.05
87.5
66.7
62.5
71.4
RecentMem
Backbone=GPT-5.1
2026.05
75
50
62.5
60.7
MemGPT
Backbone=GPT-5.1
2026.05
75
58.3
50
60.7
MemBank
Backbone=GPT-5.1
2026.05
75
50
75
64.3
RecentMem
Backbone=GPT-4o
2026.05
62.5
50
62.5
57.1
MemGPT
Backbone=GPT-4o
2026.05
62.5
50
62.5
57.1
SCMem
Backbone=GPT-4o
2026.05
62.5
25
37.5
39.3
SCMem
Backbone=GPT-5.1
2026.05
50
25
25
32.1
MemBank
Backbone=GPT-4o
2026.05
37.5
33.3
75
46.4
Feedback
Search any
task
Search any
task