Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Modeling Evaluation on Aggregate Wino Hella ARC-E ARC-C MMLU

56.67Average Accuracy

Teacher (GPT-OSS-20B)

28.95436.149543.34550.5405May 27, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.05
56.67
2026.05
33.71
2026.05
33.36
2026.05
32.86
2026.05
32.82
2026.05
32.15
2026.05
32.11
2026.05
31.72
2026.05
31.49
30.46
30.02