Share your thoughts, 1 month free Claude Pro on usSee more

Long-context Question Answering on LoCoMo (Rubric Judge)

67.3Rubric Judge Accuracy

Baseline

Updated 2mo ago

Evaluation Results

Method	Links
Baseline 2026.05		67.3
Predicate-Based Belief State 2026.05		66.6
Predicate-Based Belief State 2026.05		65.9
Predicate-Based Belief State 2026.05		65.7
Predicate-Based Belief State 2026.05		65.4
Predicate-Based Belief State 2026.05		65.4
Baseline 2026.05		65.1
Predicate-Based Belief State 2026.05		65
Predicate-Based Belief State 2026.05		63.6
Baseline 2026.05		63
Predicate-Based Belief State 2026.05		62.5
Lobotomized 2026.05		59.4
Lobotomized 2026.05		57.2
Baseline 2026.05		57.2
Lobotomized 2026.05		56.5