Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability on Aggregate (GPQA-D, GSM8K, HumanEval, MATH-500, MBPP, MMLU-Pro)

75.9Average Accuracy

FOREVER

35.8646.25556.6567.045May 10, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
75.9--
2026.05
75.9--
2026.05
75.6--
2026.05
74.6--
2026.05
74.6--
2026.05
74.321.390.8333
2026.05
74.3--
2026.05
74.3--
2026.05
73.5--
2026.05
73.5--
2026.05
73.3--
2026.05
72.93--
2026.05
72.9--
2026.05
72.9--
2026.05
72.6--
2026.05
72.541.610.5
2026.05
72.5--
2026.05
72.5--
2026.05
72.2--
2026.05
72.2--
2026.05
71.6--
2026.05
71.3--
2026.05
71.3--
2026.05
70.93--
2026.05
70.9--
2026.05
70.5--
2026.05
70.45--
2026.05
70.4--
2026.05
70.4--
2026.05
70.3--
2026.05
70.3--
2026.05
70.27-0.181
2026.05
70.2--
2026.05
69.6--
2026.05
68.7--
2026.05
68.6--
2026.05
65.6--
2026.05
64.7--
2026.05
58.3--
2026.05
58.3--
2026.05
58.3--
2026.05
58.3--
2026.05
58.261.441
2026.05
57.1--
2026.05
57.1--
2026.05
56.82--
2026.05
56.8--
2026.05
56.8--
2026.05
54.7--
2026.05
54.7--
2026.05
53.4--
2026.05
53.4--
2026.05
52.4--
2026.05
52.4--
2026.05
51.9--
2026.05
51.9--
2026.05
40.7--
2026.05
40.1--
2026.05
40.07--
2026.05
40--
2026.05
40--
2026.05
39.96-0.110.5
2026.05
39.6--
2026.05
38.9--
2026.05
38.8--
2026.05
37.4--