Share your thoughts, 1 month free Claude Pro on usSee more

Science Question Answering on GPQA Diamond

91.9Accuracy

Gemini-3.0

Updated 1mo ago

Evaluation Results

Method
Gemini-3.0 2025.12	91.9	8,000
Gemma-4 31B 2026.06	86.87	5,614
GPT-5 2025.12	85.7	8,000
DeepSeek-V3.2 2025.12	85.7	16,000
Kimi-K2 2025.12	84.5	12,000
Gemma-4 31B 2026.06	84.34	5,728
DeepSeek-V3.2 2025.12	82.4	7,000
Gemma-4 31B 2026.06	78.28	1,390
Gemma-4 31B 2026.06	74.24	1,284
ϕ-Decoding + SC 2026.05	72.6	95.8
Phi-4 14B 2026.06	69.7	11,569
DDC 2026.05	69.6	13.9
Predictive Decoding + SC 2026.05	69.5	98.6
Qwen3 4B 2026.06	69.19	9,126
Phi-4 14B 2026.06	69.19	11,353
SHAPE 2026.06	67.68	-
GPT-4.1 2025.07	67.51	-
Qwen3 4B 2026.06	67.17	8,018
LessIsMore 2025.08	65.15	-
LessIsMore 2025.08	64.85	-
SMCS 2025.07	64.81	-
Self-MoA 2025.07	64.65	-
Full Attn 2025.08	64.02	-
Gemma-4 E4B 2026.06	64	3,254
LessIsMore 2025.08	63.61	-
LessIsMore 2025.08	63.42	-
SHAPE 2026.06	62.63	-
GPT-OSS-20B 2026.06	61.62	-
LessIsMore 2025.08	61.58	-
LessIsMore 2025.08	61.11	-
LessIsMore 2025.08	60.65	-
Full Attn 2025.08	60.54	-
Gemma-4 E4B 2026.06	60.33	3,342
SHAPE 2026.06	60.1	-
DDC 2026.05	59.1	13.7
Gemma-4 E4B 2026.06	58.67	3,646
LessIsMore 2025.08	58.62	-
Qwen3-30B-A3B 2026.06	58.59	-
QwQ-32B 2025.07	57.24	-
LessIsMore 2025.08	56.84	-
LessIsMore 2025.08	56.64	-
ϕ-Decoding + SC 2026.05	56.6	86.9
Phi-4 14B 2026.06	56.57	746
SHAPE 2026.06	56.57	-
LessIsMore 2025.08	56.48	-
LessIsMore 2025.08	56.23	-
Full Attn 2025.08	56.19	-
Gemma-4 E4B 2026.06	55.07	1,979
Predictive Decoding + SC 2026.05	54.8	94.3
Phi-4 14B 2026.06	54.55	658
Qwen3 4B 2026.06	48.48	549
Qwen3 4B 2026.06	46.67	737
LessIsMore 2025.08	43.31	-
LessIsMore 2025.08	43.08	-
Full Attn 2025.08	42.8	-
GRPO 2025.07	40.4	-
RL-PLUS 2025.07	40.4	-
LessIsMore 2025.08	39.11	-
LessIsMore 2025.08	35.74	-
PLM-HoneyBee-8B 2025.10	33.3	-
SHAPE 2026.06	31.52	-
DeepSeek-V2-Lite 2026.06	28.28	-
PLM-HoneyBee-3B 2025.10	27.7	-
SHAPE 2026.06	26.53	-
InternVL-2.5-8B 2025.10	26.3	-
Qwen2.5-VL-7B-Instruct 2025.10	26.3	-
PLM-HoneyBee-1B 2025.10	25.8	-
InternVL-2.5-4B 2025.10	25.3	-
SFT 2025.07	24.7	-
SFT+GRPO 2025.07	24.2	-
Repeated Sampling 2025.10	23.23	-
GUIDEDSAMPLING 2025.10	23.23	-
Qwen2.5-VL-3B-Instruct 2025.10	22.7	-
Repeated Sampling 2025.10	20.71	-
GUIDEDSAMPLING 2025.10	20.2	-
InternVL-3-8B-Instruct 2025.10	20.2	-
InternVL-3-1B-Instruct 2025.10	19.2	-
Tree-of-thought 2025.10	19.19	-
PLM-3B 2025.10	18.7	-
PLM-8B 2025.10	16.2	-
Base Model 2025.07	13.1	-
InternVL-2.5-1B 2025.10	12.1	-
PLM-1B 2025.10	7.1	-
Tree-of-thought 2025.10	7.07	-