Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on GSM8k, MATH 500, GPQA, SuperGPQA

63.8Average Accuracy

BF16

6.28821.21936.1551.081Jan 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
63.8
2026.01
62.7
2026.01
60.9
2026.01
56.9
2026.01
55.9
2026.01
52.6
2026.01
29.6
2026.01
25.2
2026.01
23.2
2026.01
13
2026.01
8.5