Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning Average on Aggregate (OBQA, CSQA, SIQA, ARC, MMLU, GSM8K-MC, AQUA)

86.21Average Accuracy

IoT

71.358875.214479.0782.9256Mar 15, 2026
Updated 10d ago

Evaluation Results

MethodLinks
2026.03
86.21
2026.03
85.23
2026.03
80.69
2026.03
80.04
2026.03
78.75
2026.03
77.35
2026.03
76.13
2026.03
75.38
2026.03
75.22
2026.03
74.89
2026.03
74.81
2026.03
72.38
2026.03
72.28
2026.03
71.93