Share your thoughts, 1 month free Claude Pro on usSee more

Reasoning on GPQA Protocol A (test)

87.3Accuracy

OpenHands CodeActAgent + GBT-SE

Updated 4mo ago

Evaluation Results

Method	Links
OpenHands CodeActAgent + GBT-SE 2026.01		87.3	73	0.2	0	15	58
OpenHands CodeActAgent + GBT-Basic 2026.01		78.8	71.9	0.2	0	16	62
OpenHands CodeActAgent + Global guardrail only 2026.01		59.2	-	0.4	20	21	82
OpenHands CodeActAgent 2026.01		58.7	-	1.6	40	22	86
Zero-shot prompting 2026.01		53.6	-	-	-	-	-