Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Science Question Answering on GPQA Diamond
Loading...
91.9
Accuracy
Gemini-3.0
3.6768
26.5809
49.485
72.3891
Oct 4, 2025
Oct 13, 2025
Oct 23, 2025
Nov 2, 2025
Nov 12, 2025
Nov 22, 2025
Dec 2, 2025
Accuracy
Output Token Count
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Output Token Count
Gemini-3.0
variant=Pro, protocol=...
2025.12
91.9
8,000
GPT-5
variant=High, protocol...
2025.12
85.7
8,000
DeepSeek-V3.2
variant=Speciale, prot...
2025.12
85.7
16,000
Kimi-K2
variant=Thinking, prot...
2025.12
84.5
12,000
DeepSeek-V3.2
variant=Thinking, prot...
2025.12
82.4
7,000
Repeated Sampling
Model=Llama-3.2-3B-Ins...
2025.10
23.23
-
GUIDEDSAMPLING
Model=Llama-3.2-3B-Ins...
2025.10
23.23
-
Repeated Sampling
Model=Qwen2.5-3B-Instr...
2025.10
20.71
-
GUIDEDSAMPLING
Model=Qwen2.5-3B-Instr...
2025.10
20.2
-
Tree-of-thought
Model=Llama-3.2-3B-Ins...
2025.10
19.19
-
Tree-of-thought
Model=Qwen2.5-3B-Instr...
2025.10
7.07
-
Feedback
Search any
task
Search any
task