Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Knowledge Evaluation on SciKnowEval
Loading...
52.1
Accuracy
o3
42.948
45.324
47.7
50.076
Aug 26, 2025
Accuracy
Delta Score
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta Score
o3
Reasoning Effort=Low
2025.08
52.1
-
o3
Reasoning Effort=High
2025.08
51.9
-0.2
o3-mini
Reasoning Effort=High
2025.08
51.9
2.9
o4-mini
Reasoning Effort=High
2025.08
51.1
1.2
o4-mini
Reasoning Effort=Low
2025.08
49.9
-
o3-mini
Reasoning Effort=Low
2025.08
49
-
Gemini-2.5-Pro
Reasoning Effort=High
2025.08
47.6
0.8
Gemini-2.5-Pro
Reasoning Effort=Low
2025.08
46.8
-
GPT-5
Reasoning Effort=High
2025.08
46.7
1.2
GPT-5
Reasoning Effort=Low
2025.08
45.5
-
Claude-Sonnet-4
Reasoning Effort=Low
2025.08
43.6
-
Claude-Sonnet-4
Reasoning Effort=High
2025.08
43.3
-0.3
Feedback
Search any
task
Search any
task