Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge Reasoning on GPQA Diamond (pass@1)
Loading...
82.8
pass@1
Gemini-2.5 Flash-Thinking
61.168
66.784
72.4
78.016
Dec 15, 2025
pass@1
Updated 3mo ago
Evaluation Results
Method
Method
Links
pass@1
Gemini-2.5 Flash-Thinking
Thinking Mode=true
2025.12
82.8
DeepSeek-R1 0528 671B
Parameters=671B, Think...
2025.12
81
Nemotron-Cascade 14B-Thinking
Parameters=14B, Thinki...
2025.12
69.6
Nemotron Cascade-8B
Parameters=8B, Thinkin...
2025.12
66.5
Nemotron-Nano 9B-v2
Parameters=9B-v2
2025.12
64
Qwen3 14B
Parameters=14B
2025.12
64
Qwen3 8B
Parameters=8B
2025.12
62
Feedback
Search any
task
Search any
task