Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Conversational Question Answering on LoCoMo Adversarial
Loading...
1.94
G-EVAL Score
EviMem
1.1392
1.3471
1.555
1.7629
Apr 30, 2026
G-EVAL Score
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
G-EVAL Score
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
EviMem
System=EviMem (Full)
2026.04
1.94
55.1
8.1
8.3
84
MIRIX
System=MIRIX
2026.04
1.92
57.9
8.3
8.2
83.5
Single-pass
System=Single-pass (La...
2026.04
1.17
43.4
2.3
2.4
83
Feedback
Search any
task
Search any
task