Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Conversational Question Answering on LoCoMo Overall
Loading...
2.81
G-EVAL
EviMem
2.2172
2.3711
2.525
2.6789
Apr 30, 2026
G-EVAL
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
G-EVAL
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
EviMem
System=EviMem (Full)
2026.04
2.81
76.5
17.7
17
85.2
MIRIX
System=MIRIX
2026.04
2.75
75.9
11.3
10.8
84
Single-pass
System=Single-pass (La...
2026.04
2.24
66.4
12.7
12.1
84.4
Feedback
Search any
task
Search any
task