Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on TheoremQA

82.49Accuracy

Upper Bound

14.47432.13249.7967.448Aug 29, 2025Oct 4, 2025Nov 10, 2025Dec 17, 2025Jan 22, 2026Feb 28, 2026Apr 6, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
82.49-
2025.08
55.22-
2025.08
48.59-
2025.08
44.71-
2025.08
43.47-
2025.08
42.93-
2025.08
42.57-
2026.04
42.36,566
2026.04
42.110,754
2025.08
41.81-
2026.04
41.54,987
2025.08
41.5-
2026.04
40.93,141
2025.08
40.83-
2026.04
40.85,375
2025.08
40.7-
2026.04
40.35,988
2025.08
40.03-
2026.04
403,975
2026.04
39.92,320
2026.04
39.85,916
2025.08
39.76-
2026.04
39.52,578
2025.08
39.38-
2026.04
38.11,778
2025.08
37.846-
2025.08
36.76-
2025.08
36.46-
2025.08
36.46-
2025.08
36.37-
2025.08
36.32-
2025.08
36.32-
2025.08
35.88-
2025.08
34.94-
2025.08
33.11-
2025.08
32.75-
2025.08
32.75-
2025.08
31.91-
2025.08
31.59-
2025.08
31.46-
2025.08
31.46-
2025.08
31.42-
2025.08
31.24-
2025.08
31.24-
2025.08
29.76-
2025.08
29.76-
2025.08
29.41-
2025.08
29.17-
2025.08
28.2-
2025.08
28.07-
2025.08
28.02-
2025.08
27.98-
2025.08
27.84-
2025.08
27.84-
2025.08
26.73-
2025.08
26.58-
2025.08
23.16-
2025.08
22.71-
2025.08
21.02-
2025.08
20.08-
2025.08
20.08-
2025.08
19.72-
2025.08
18.88-
2025.08
18.62-
2025.08
18.62-
2025.08
18.03-
2025.08
17.45-
2025.08
17.09-