Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on MuSiQue (Rubric Judge Accuracy)
Loading...
39.6
Rubric Judge Accuracy
Predicate-Based Belief State
7.776
16.038
24.3
32.562
May 7, 2026
Rubric Judge Accuracy
Updated 24d ago
Evaluation Results
Method
Method
Links
Rubric Judge Accuracy
Predicate-Based Belief State
Harness=Iter-RetGen, R...
2026.05
39.6
Baseline
Harness=Iter-RetGen
2026.05
37.1
Predicate-Based Belief State
Harness=ReAct, Represe...
2026.05
36.8
Predicate-Based Belief State
Harness=MemGPT, Repres...
2026.05
34.4
Predicate-Based Belief State
Harness=Iter-RetGen, R...
2026.05
33.8
Baseline
Harness=MemGPT
2026.05
31.4
Lobotomized
Harness=MemGPT
2026.05
30.6
Predicate-Based Belief State
Harness=ReAct, Represe...
2026.05
30.1
Baseline
Harness=ReAct
2026.05
29.6
Predicate-Based Belief State
Harness=MemGPT, Repres...
2026.05
29.6
Predicate-Based Belief State
Harness=IRCoT, Represe...
2026.05
25.9
Predicate-Based Belief State
Harness=IRCoT, Represe...
2026.05
21.1
Baseline
Harness=IRCoT
2026.05
14.4
Lobotomized
Harness=ReAct
2026.05
12.2
Lobotomized
Harness=IRCoT
2026.05
9
Feedback
Search any
task
Search any
task