Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Conversational Question Answering on LoCoMo Temporal
Loading...
3.08
G-EVAL
EviMem
1.8216
2.1483
2.475
2.8017
Apr 30, 2026
G-EVAL
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
G-EVAL
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
EviMem
System=EviMem (Full)
2026.04
3.08
81.6
13.5
12.4
84.5
MIRIX
System=MIRIX
2026.04
2.69
73.3
8.2
7.3
83.3
Single-pass
System=Single-pass (La...
2026.04
1.87
58.8
6.9
6.1
83.9
Feedback
Search any
task
Search any
task