Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Evaluation on MMLU, HellaSwag, PIQA, BoolQ, WinoGrande, ARC-E, ARC-C, and OBQA (500 samples/task)

74.4MMLU

Unpruned

26.14438.67251.263.728Apr 26, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
74.471.485.48767.491.667.28979.2
2026.04
44497372.853.265.237.643.254.8
2026.04
43.449.47373.45364.837.44354.6
2026.04
42.448.87472.253.265.4384354.6
2026.04
34.249.277.2695860.435.440.253
2026.04
32.242.667.652.249.64425.230.843
2026.04
2839.672.662.252.440.829.431.844.6