Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Question Answering on LoCoMo (Rubric Judge)
Loading...
67.3
Rubric Judge Accuracy
Baseline
56.068
58.984
61.9
64.816
May 7, 2026
Rubric Judge Accuracy
Updated 24d ago
Evaluation Results
Method
Method
Links
Rubric Judge Accuracy
Baseline
Harness=Iter-RetGen
2026.05
67.3
Predicate-Based Belief State
Harness=ReAct, Represe...
2026.05
66.6
Predicate-Based Belief State
Harness=Iter-RetGen, R...
2026.05
65.9
Predicate-Based Belief State
Harness=ReAct, Represe...
2026.05
65.7
Predicate-Based Belief State
Harness=IRCoT, Represe...
2026.05
65.4
Predicate-Based Belief State
Harness=MemGPT, Repres...
2026.05
65.4
Baseline
Harness=ReAct
2026.05
65.1
Predicate-Based Belief State
Harness=Iter-RetGen, R...
2026.05
65
Predicate-Based Belief State
Harness=IRCoT, Represe...
2026.05
63.6
Baseline
Harness=IRCoT
2026.05
63
Predicate-Based Belief State
Harness=MemGPT, Repres...
2026.05
62.5
Lobotomized
Harness=ReAct
2026.05
59.4
Lobotomized
Harness=IRCoT
2026.05
57.2
Baseline
Harness=MemGPT
2026.05
57.2
Lobotomized
Harness=MemGPT
2026.05
56.5
Feedback
Search any
task
Search any
task