Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on GPQA Protocol A (test)

87.3Accuracy

OpenHands CodeActAgent + GBT-SE

52.25261.35170.4579.549Jan 30, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
87.3730.201558
2026.01
78.871.90.201662
2026.01
59.2-0.4202182
2026.01
58.7-1.6402286
2026.01
53.6-----