Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Problem Solving on SciBench (Accuracy, Δ)
Loading...
72.1
Accuracy
o3
44.956
52.003
59.05
66.097
Aug 26, 2025
Accuracy
Delta (Δ)
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta (Δ)
o3
Reasoning Effort=High
2025.08
72.1
2.4
GPT-5
Reasoning Effort=High
2025.08
72
1.6
Gemini-2.5-Pro
Reasoning Effort=Low
2025.08
71
-
GPT-5
Reasoning Effort=Low
2025.08
70.4
-
Gemini-2.5-Pro
Reasoning Effort=High
2025.08
70.2
-0.8
o3
Reasoning Effort=Low
2025.08
69.7
-
o4-mini
Reasoning Effort=High
2025.08
69.7
4.2
Claude-Sonnet-4
Reasoning Effort=High
2025.08
67.1
1.6
o3-mini
Reasoning Effort=High
2025.08
66.3
20.3
o4-mini
Reasoning Effort=Low
2025.08
65.5
-
Claude-Sonnet-4
Reasoning Effort=Low
2025.08
65.5
-
o3-mini
Reasoning Effort=Low
2025.08
46
-
Feedback
Search any
task
Search any
task