Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning on GPQA-D (test)

68.7Accuracy

Think

Updated 4mo ago

Evaluation Results

Method	Links
Think 2026.02		68.7	9,041	325.8
SpecExit 2026.02		68.7	7,011	137
EAGLE3 2026.02		67.7	8,975	212.2
NoThink* 2026.02		67.2	8,833	276.8
DEER 2026.02		67.2	9,053	505.2
SpecExit 2026.02		46	6,849	307.5
EAGLE3 2026.02		43.9	8,749	420.1
Vanilla 2026.02		43.6	8,857	574
DEER 2026.02		40.9	8,492	521.5
NoThink 2026.02		26.8	1,200	166.6