Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering Question Answering on SWE-QA Pro
Loading...
51.4
Rubric Judge Accuracy
Predicate-Based Belief State
26.336
32.843
39.35
45.857
May 7, 2026
Rubric Judge Accuracy
Updated 23d ago
Evaluation Results
Method
Method
Links
Rubric Judge Accuracy
Predicate-Based Belief State
Harness=Iter-RetGen, R...
2026.05
51.4
Baseline
Harness=Iter-RetGen
2026.05
50.9
Baseline
Harness=MemGPT
2026.05
48.9
Predicate-Based Belief State
Harness=Iter-RetGen, R...
2026.05
48.4
Predicate-Based Belief State
Harness=MemGPT, Repres...
2026.05
47.7
Lobotomized
Harness=MemGPT
2026.05
43.6
Predicate-Based Belief State
Harness=ReAct, Represe...
2026.05
43
Predicate-Based Belief State
Harness=MemGPT, Repres...
2026.05
42.3
Baseline
Harness=ReAct
2026.05
39.8
Predicate-Based Belief State
Harness=ReAct, Represe...
2026.05
37.3
Predicate-Based Belief State
Harness=IRCoT, Represe...
2026.05
34.6
Lobotomized
Harness=ReAct
2026.05
33.7
Predicate-Based Belief State
Harness=IRCoT, Represe...
2026.05
33.3
Lobotomized
Harness=IRCoT
2026.05
31.7
Baseline
Harness=IRCoT
2026.05
27.3
Feedback
Search any
task
Search any
task