Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Understanding on LLM Evaluation Suite (PiQA, ARC, HellaSwag, WinoGrande, MMLU v1)

74.6Overall Accuracy

LLaMA-3-8B-Lizard

65.2467.6770.172.53Jul 11, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.07
74.672.4
2025.07
74.570.7
2025.07
74.572.2
2025.07
74.472.4
2025.07
74.270.7
2025.07
74.172.3
2025.07
73.172
2025.07
72.467.6
2025.07
71.165.8
2025.07
7164.7
2025.07
70.965.1
2025.07
69.964
2025.07
69.663.8
2025.07
69.469.4
2025.07
68.264.1
2025.07
65.661.9