Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following with Long-term Memory on Human Evaluation 1-10 scale (test)
Loading...
8.7
Coherence
EventWeave
6.724
7.237
7.75
8.263
Mar 29, 2025
Coherence
Correctness
Style
Helpfulness
Average Score
Updated 9d ago
Evaluation Results
Method
Method
Links
Coherence
Correctness
Style
Helpfulness
Average Score
EventWeave
Base Model=GPT-4o
2025.03
8.7
8.5
8.4
8.6
8.6
LifeLongMem
Base Model=GPT-4o
2025.03
8
7.9
8.2
7.9
8
MemWalker
Base Model=GPT-4o
2025.03
7.8
7.6
8.1
7.7
7.8
LongMem
Base Model=GPT-4o
2025.03
7.7
7.5
8
7.6
7.7
ProactiveCoT
Base Model=GPT-4o
2025.03
7.5
7.2
8
7.4
7.5
GPT-4o
mode=vanilla
2025.03
6.8
6.5
7.9
6.7
7
Feedback
Search any
task
Search any
task