Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Problem Solving on SciBench (Accuracy, Δ)

72.1Accuracy

o3

44.95652.00359.0566.097Aug 26, 2025
Updated 5d ago

Evaluation Results

MethodLinks
2025.08
72.12.4
2025.08
721.6
2025.08
71-
2025.08
70.4-
2025.08
70.2-0.8
2025.08
69.7-
2025.08
69.74.2
2025.08
67.11.6
2025.08
66.320.3
2025.08
65.5-
2025.08
65.5-
2025.08
46-