Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on GPQA (Accuracy)

63.6Accuracy

MOCHA

2.7618.55534.3550.145Sep 2, 2025Oct 15, 2025Nov 27, 2025Jan 9, 2026Feb 21, 2026Apr 5, 2026May 19, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
63.6
2026.05
59.2
2026.05
59.2
2026.05
59.2
2026.05
59.2
2025.09
32.81
2025.09
32.37
2025.09
31.92
2025.09
30.8
2025.09
30.8
2025.09
30.58
2025.09
30.58
2025.09
30.13
2025.09
29.91
2025.09
29.24
2025.09
28.79
2025.09
28.79
2025.09
28.57
2025.09
28.57
2025.09
28.35
2025.09
28.35
2025.09
28.12
2025.09
27.9
2025.09
27.68
2025.09
27.23
2025.09
26.34
2025.09
26.12
2025.09
25
2025.09
23.88
2025.09
23.44
2025.09
22.32
2026.05
22.2
2025.09
18.53
2026.05
17.7
2026.05
17.7
2026.05
10.1
2026.05
5.1