Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Conversational Question Answering on LoCoMo Open-domain
Loading...
3.24
G-EVAL
MIRIX
2.5744
2.7472
2.92
3.0928
Apr 30, 2026
G-EVAL
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
G-EVAL
Judge Accuracy
F1 Score
ROUGE-L
BERTScore
MIRIX
System=MIRIX
2026.04
3.24
91.6
13.2
13.2
84.6
EviMem
System=EviMem (Full)
2026.04
3.17
85.9
19.2
19.1
85.7
Single-pass
System=Single-pass (La...
2026.04
2.6
75.5
16.4
16.2
85.2
Feedback
Search any
task
Search any
task