Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on GPQA-Diamond & MMLU-Pro

57.6Accuracy

SORT

7.57620.56333.5546.537May 10, 2026May 11, 2026May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
57.6--0.6
2026.05
57.6--0
2026.05
57--0.2
2026.05
56.8--0
2026.05
56---13
2026.05
53.6---1.4
2026.05
48.725,410.610.49-
2026.05
43.833,617.530.55-
2026.05
43.223,416.070.55-
2026.05
42.25,293.270.38-
2026.05
40.364,108.680.4-
2026.05
39.814,516.540.36-
2026.05
39.6--3.4
2026.05
38.453,816.830.39-
2026.05
38.1--1.9
2026.05
37.2--0.7
2026.05
37.1--0
2026.05
36.6--0.8
2026.05
32.4---1.8
2026.05
31.955,403.760.22-
2026.05
29.4---23.3
2026.05
28.094,468.940.22-
2026.05
28.024,634.170.18-
2026.05
27.285,794.190.14-
2026.05
26.4--11.3
2026.05
26.3--10.6
2026.05
25.974,064.840.17-
2026.05
25.934,534.660.17-
2026.05
24.1--8.5
2026.05
23.974,656.370.13-
2026.05
17.9--0
2026.05
16---1.1
2026.05
11.7---3.4
2026.05
11.6---7.3
2026.05
9.5---24.2