Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability Evaluation on MMLU, GSM8K, HumanEval, IFEval

78.21MMLU

SFT

46.500454.732762.96571.1973Apr 2, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.04
78.2188.0283.5445.6673.86
2026.04
77.7988.8681.16177.19
2026.04
77.5485.6785.9861.9277.78
2026.04
76.9386.281.7163.477.06
2026.04
76.4390.7582.936177.78
2026.04
76.1985.1479.2760.8175.35
2026.04
76.0187.6479.8860.8176.09
2026.04
75.2881.5881.7157.8674.11
2026.04
50.751.8640.8545.8447.31
2026.04
49.9658.7651.2236.649.14
2026.04
48.6249.7344.5145.146.99
2026.04
47.7253.347.843.0747.97