Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA (Accuracy & Generation Length)

70.2Accuracy

MCTS with Const-o-T

0.41618.53336.6554.767May 27, 2025Jul 10, 2025Aug 24, 2025Oct 8, 2025Nov 22, 2025Jan 6, 2026Feb 20, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2025.10
70.2-
2025.10
67.1-
2025.10
66.6-
2025.10
65.9-
2025.10
65.9-
2025.10
65.6-
2025.10
65.4-
2025.10
65.1-
2025.10
64.1-
2025.10
52-
2025.10
50.5-
2025.10
48.4-
2026.01
45.5179.3
2026.01
45.5115.7
2025.05
45-
2025.05
43.94-
2026.01
43.6130.8
2026.01
43.68
2026.01
43.68
2025.10
43-
2025.10
42.9-
2026.01
42.86
2025.05
42.57-
2025.10
41.9-
2026.01
41.88
2026.01
41.88
2025.10
40.9-
2026.01
40229.5
2026.01
39.60
2026.02
39.1-
2025.10
38.5-
2026.01
38.189.3
2026.01
37.130.8
2026.01
36.48
2026.01
36.48
2026.02
36.4-
2025.10
36.3-
2026.01
34.59,492
2026.01
34.5150.2
2026.01
34.58
2026.02
34.3-
2026.02
32.8-
2026.01
32.78
2026.02
31.9-
2026.02
31.8-
2026.01
31.46
2026.01
30.70
2026.01
28.90
2026.01
28.83,670
2026.01
28.66
2026.01
28.31,708
2026.01
27.83,815
2026.01
27.728.7
2026.01
27.35,589
2026.01
27.38
2026.01
27.16
2026.01
26.82,129
2026.01
266
2026.01
24.22,188
2026.02
24.2-
2026.01
20.71,044
2026.01
20.433.9
2026.01
203,655
2026.01
19.7743
2026.01
19.24,569
2025.05
13.87-
2025.05
13.64-
2025.05
11.11-
2025.12
5.25-
2025.12
4.35-
2025.12
4.35-
2025.12
3.99-
2025.12
3.62-
2025.12
3.44-
2025.12
3.1-