Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Evaluation on SciEval
Loading...
87.5
Accuracy
o4-mini
82.508
83.804
85.1
86.396
Aug 26, 2025
Accuracy
Delta
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta
o4-mini
Reasoning Effort=High
2025.08
87.5
0.4
GPT-5
Reasoning Effort=Low
2025.08
87.4
-
o4-mini
Reasoning Effort=Low
2025.08
87.1
-
Gemini-2.5-Pro
Reasoning Effort=Low
2025.08
86.4
-
GPT-5
Reasoning Effort=High
2025.08
86.1
-1.3
Claude-Sonnet-4
Reasoning Effort=Low
2025.08
85.8
-
Claude-Sonnet-4
Reasoning Effort=High
2025.08
85.8
0
Gemini-2.5-Pro
Reasoning Effort=High
2025.08
85.1
-1.3
o3
Reasoning Effort=Low
2025.08
84.8
-
o3-mini
Reasoning Effort=Low
2025.08
83.8
-
o3-mini
Reasoning Effort=High
2025.08
83.4
-0.4
o3
Reasoning Effort=High
2025.08
82.7
-2.1
Feedback
Search any
task
Search any
task