Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Olympiad-level Science Problem Solving on OlympBench
Loading...
69.6
Accuracy
Gemini-2.5-Pro
38.296
46.423
54.55
62.677
Aug 26, 2025
Accuracy
Delta
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta
Gemini-2.5-Pro
Reasoning Effort=High
2025.08
69.6
2.1
Gemini-2.5-Pro
Reasoning Effort=Low
2025.08
67.5
-
GPT-5
Reasoning Effort=High
2025.08
64.9
4.8
GPT-5
Reasoning Effort=Low
2025.08
60
-
Claude-Sonnet-4
Reasoning Effort=High
2025.08
59.8
4.4
o3
Reasoning Effort=High
2025.08
58
4.5
Claude-Sonnet-4
Reasoning Effort=Low
2025.08
55.4
-
o3
Reasoning Effort=Low
2025.08
53.5
-
o3-mini
Reasoning Effort=High
2025.08
51.1
11.6
o4-mini
Reasoning Effort=High
2025.08
49.6
9.2
o4-mini
Reasoning Effort=Low
2025.08
40.4
-
o3-mini
Reasoning Effort=Low
2025.08
39.5
-
Feedback
Search any
task
Search any
task