Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Reasoning on GPQA-Diamond (Pass@1)
Loading...
91.9
Pass@1
Gemini 3-pro
28.564
45.007
61.45
77.893
Mar 21, 2026
Pass@1
Updated 25d ago
Evaluation Results
Method
Method
Links
Pass@1
Gemini 3-pro
Dataset=GPQA-Diamond
2026.03
91.9
Gemini 2.5-pro
Dataset=GPQA-Diamond
2026.03
86.4
GPT-5 High
Dataset=GPQA-Diamond
2026.03
85.7
Seed1.8
Dataset=GPQA-Diamond
2026.03
83.8
Claude Sonnet-4.5
Dataset=GPQA-Diamond
2026.03
83.4
Gemini 3-pro
Dataset=PHYBench
2026.03
59
Gemini 3-pro
Dataset=BIOBench
2026.03
51.9
Gemini 2.5-pro
Dataset=PHYBench
2026.03
48
GPT-5 High
Dataset=BIOBench
2026.03
48
Claude Sonnet-4.5
Dataset=BIOBench
2026.03
44.6
Seed1.8
Dataset=BIOBench
2026.03
42.3
Gemini 2.5-pro
Dataset=BIOBench
2026.03
41.5
Seed1.8
Dataset=PHYBench
2026.03
41
GPT-5 High
Dataset=PHYBench
2026.03
40
Claude Sonnet-4.5
Dataset=PHYBench
2026.03
31
Feedback
Search any
task
Search any
task