Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability Evaluation on MMLU, GSM8K, HumanEval, IFEval

77.78Common Average Score

NOVA

35.55646.51857.4868.442Apr 2, 2026Apr 12, 2026Apr 22, 2026May 2, 2026May 12, 2026May 22, 2026Jun 1, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.04
77.7876.4390.7582.9361
2026.04
77.7877.5485.6785.9861.92
2026.04
77.1977.7988.8681.161
2026.04
77.0676.9386.281.7163.4
2026.04
76.0976.0187.6479.8860.81
2026.04
75.3576.1985.1479.2760.81
2026.04
74.1175.2881.5881.7157.86
2026.04
73.8678.2188.0283.5445.66
2026.06
72.59----
2026.06
70.08----
2026.06
69.84----
2026.06
69.32----
2026.06
68.93----
2026.06
68.92----
2026.06
68.7----
2026.06
67.51----
2026.06
64.58----
2026.06
56.21----
2026.06
53.41----
2026.06
53.02----
2026.06
52.77----
2026.06
52.49----
2026.06
52.01----
2026.06
52----
2026.06
51.46----
2026.04
49.1449.9658.7651.2236.6
2026.04
47.9747.7253.347.843.07
2026.04
47.3150.751.8640.8545.84
2026.04
46.9948.6249.7344.5145.1
2026.06
44.73----
2026.06
43.3----
2026.06
43.19----
2026.06
43.04----
2026.06
42.8----
2026.06
42.46----
2026.06
42.21----
2026.06
41.73----
2026.06
41.53----
2026.06
37.18----