Share your thoughts, 1 month free Claude Pro on usSee more

Science Reasoning on GPQA Diamond (Pass@1)

91.9Pass@1

Gemini-3.0 Pro

Updated 1mo ago

Evaluation Results

Method	Links
Gemini-3.0 Pro 2025.12		91.9
GPT-5 High 2025.12		85.7
Kimi-K2 2025.12		84.5
Claude-4.5-Sonnet 2025.12		83.4
DeepSeek-V3.2 2025.12		82.4
MiniMax M2 2025.12		77.7
TRAPO 2025.12		43.9
Low-temperature 2026.04		43
SMC reward-guided lookahead 2026.04		42.4
SMC reward-guided lookahead 2026.04		42.4
Scalable Power Sampling 2026.04		40.9
Fully Supervised 2025.12		40.4
Fully Supervised 2025.12		39.9
GRPO (MATH) 2026.04		39.9
Mixed 2026.06		39.9
MCMC Power Sampling 2026.04		38.9
QuestA 2026.06		38.89
Dynamic 2026.06		38.89
SMC (reward) 2026.04		38.8
SMC reward-guided lookahead 2026.04		38.4
SMC (reward) 2026.04		38.4
Scaf-GRPO 2026.06		38.38
TRAPO 2025.12		37.9
Ablation 2026.06		36.87
Fully Supervised 2025.12		36.4
Scalable Power Sampling 2026.04		36.4
Base 2026.06		36.36
TTRL 2025.12		35.4
TTRL 2025.12		35.4
GRPO (MATH) 2026.04		35.4
Low-temperature 2026.04		35.3
Scalable Power Sampling 2026.04		34.9
Power-SMC 2026.04		34.9
MCMC Power Sampling 2026.04		34.5
Vanilla 2026.06		34.34
Sub-only 2026.06		34.34
Best-of-N 2026.04		34.3
SFT 2026.06		33.84
Sentence-level Entropy 2025.12		33.8
Best-of-N 2026.04		33.8
Token-level Entropy 2025.12		33.3
Base 2026.04		33.3
GRPO (MATH) 2026.04		33.3
Power-SMC 2026.04		32.6
Sentence-level Entropy 2025.12		32.3
Token-level Entropy 2025.12		32.3
SMC (reward) 2026.04		32.3
MCMC Power Sampling 2026.04		31.8
Power-SMC 2026.04		31.3
Self-certainty 2025.12		30.8
Self-certainty 2025.12		30.3
Low-temperature 2026.04		30.3
Best-of-N 2026.04		28.2
Base 2026.04		27.8
Base 2026.04		27.8
Qwen-Instruct 2025.12		24.7
Qwen-Base 2025.12		11.1