Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Conversational Question Answering on LoCoMo Single-hop
Loading...
2.98
G-EVAL Score
EviMem
2.4912
2.6181
2.745
2.8719
Apr 30, 2026
G-EVAL Score
Judge Accuracy
F1
ROUGE-L
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
G-EVAL Score
Judge Accuracy
F1
ROUGE-L
BERTScore
EviMem
System=EviMem (Full)
2026.04
2.98
68.2
20.5
18.7
85.7
MIRIX
System=MIRIX
2026.04
2.93
69.6
13
11.2
84.3
Single-pass
System=Single-pass (La...
2026.04
2.51
61.3
17.3
15.5
85.1
Feedback
Search any
task
Search any
task